Papers by Renato Figueiredo
IEEE Transactions on Nanotechnology, 2006
This paper describes techniques for establishing private distributed file system sessions for com... more This paper describes techniques for establishing private distributed file system sessions for computational grids. These techniques build on previous work on proxy-based virtualization of Network File Systems (NFS); novel in this paper are the support for multiple proxies, encrypted ...
In this paper we introduce Social VPNs, a novel system architecture which leverages existing soci... more In this paper we introduce Social VPNs, a novel system architecture which leverages existing social networking infrastructures to enable ad-hoc VPNs which are self-configuring, self-managing, yet maintain security against untrusted parties. The key principles in our approach are: (1) self-configuring virtual network overlays enable seamless bi-directional IP-layer connectivity among parties linked by means of social connections; (2) social networking infrastructures greatly facilitate the establishment of trust relationships among parties, and these can be seamlessly integrated with existing public-key cryptography implementations to authenticate and encrypt traffic flows on overlay links end-to-end; and (3) knowledge of social connections can be used to improve the performance of overlay routing.
Wide-Area Overlays of Virtual Workstations (WOWs) have been shown to provide excellent infrastruc... more Wide-Area Overlays of Virtual Workstations (WOWs) have been shown to provide excellent infrastructure for deploying high throughput computing environments on commodity desktop machines by (1) offering scalability to a large number of nodes, (2) facilitating addition of new nodes even if they are behind NATs/Firewalls and (3) supporting unmodified applications and middleware. However, deployment of WOWs from scratch still requires setting up a bootstrapping network and managing centralized DHCP servers for IP address management. In this paper we describe novel techniques that allow multiple users to create independent, isolated virtual IP namespaces for their WOWs without requiring a dedicated bootstrapping infrastructure, and to provision dynamic host configuration (e.g. IP addresses) to unmodified DHCP clients without requiring the setup and management of a central DHCP server. We give qualitative and quantitative arguments to establish the feasibility of our approach. 1
The In-VIGO approach to Grid-computing relies on the dynamic establishment of virtual grids on wh... more The In-VIGO approach to Grid-computing relies on the dynamic establishment of virtual grids on which application services are instantiated. In-VIGO was conceived to enable computational science to take place In Virtual Information Grid Organizations. Having its first version deployed on July of 2003, In-VIGO middleware is currently used by scientists from various disciplines, a noteworthy example being the computational nanoelectronics research community (http://www.nanohub.org). All components of an In-VIGO-generated virtual grid ā machines, networks, applications and data ā are themselves virtual and services are provided for their dynamic creation. This article reviews the In-VIGO approach to Grid-computing and overviews the associated middleware techniques and architectures for virtualizing Grid components, using services for creation of virtual grids and automatically Grid-enabling unmodified applications. The In-VIGO approach to the implementation of virtual networks and virtual application services are discussed as examples of Grid-motivated approaches to resource virtualization and Web-service creation.
VMs. Results show that the solution delivers performance over 30% better than native NFS and can ... more VMs. Results show that the solution delivers performance over 30% better than native NFS and can bring application-perceived overheads below 10% relatively to a local disk setup. The solution also allows a VM with 1.6GB virtual disk and 320MB virtual memory to be cloned within 160 seconds when it is first instantiated (and within 25 seconds for subsequent clones).
Application awareness is an important factor of efficient resource scheduling. This paper introdu... more Application awareness is an important factor of efficient resource scheduling. This paper introduces a novel approach for application classification based on the Principal Component Analysis (PCA) and the k-Nearest Neighbor (k-NN) classifier. This approach is used to assist scheduling in heterogeneous computing environments. It helps to reduce the dimensionality of the performance feature space and classify applications based on extracted features. The classification considers four dimensions: CPU-intensive, I/O and paging-intensive, network-intensive, and idle. Application class information and the statistical abstracts of the application behavior are learned over historical runs and used to assist multi-dimensional resource scheduling. This paper describes a prototype classifier for applicationcentric Virtual Machines. Experimental results show that scheduling decisions made with the assistance of the application class information, improved system throughput by 22.11% on average, for a set of three benchmark applications.
This paper describes WOW, a distributed system that combines virtual machine, overlay networking ... more This paper describes WOW, a distributed system that combines virtual machine, overlay networking and peerto-peer techniques to create scalable wide-area networks of virtual workstations for high-throughput computing. The system is architected to: facilitate the addition of nodes to a pool of resources through the use of system virtual machines (VMs) and self-organizing virtual network links; to maintain IP connectivity even if VMs migrate across network domains; and to present to end-users and applications an environment that is functionally identical to a local-area network or cluster of workstations. We describe a novel, extensible user-level decentralized technique to discover, establish and maintain overlay links to tunnel IP packets over different transports (including UDP and TCP) and across firewalls. We also report on several experiments conducted on a testbed WOW deployment with 118 P2P router nodes over PlanetLab and 33 VMware-based VM nodes distributed across six firewalled domains. Experiments show that the latency in joining a WOW network is of the order of seconds: in a set of 300 trials, 90% of the nodes self-configured P2P routes within 10 seconds, and more than 99% established direct connections to other nodes within 200 seconds. Experiments also show that the testbed delivers good performance for two unmodified, representative benchmarks drawn from the life-sciences domain. The testbed WOW achieves an overall throughput of 53 jobs/minute for PBS-scheduled executions of the MEME application (with average single-job sequential running time of 24.1s) and a parallel speedup of 13.5 for the PVM-based fastDNAml application. Experiments also demonstrate that the system is capable of seamlessly maintaining connectivity at the virtual IP layer for typical client/server applications (NFS, SSH, PBS) when VMs migrate across a WAN.
Computing Research Repository, 2006
Peer-to-peer (P2P) networks have mostly focused on task oriented networking, where networks are c... more Peer-to-peer (P2P) networks have mostly focused on task oriented networking, where networks are constructed for single applications, i.e. file-sharing, DNS caching, etc. In this work, we introduce IPOP, a system for creating virtual IP networks on top of a P2P overlay. IPOP enables seamless access to Grid resources spanning multiple domains by aggregating them into a virtual IP network that is completely isolated from the physical network. The virtual IP network provided by IPOP supports deployment of existing IP-based protocols over a robust, self-configuring P2P overlay. We present implementation details as well as experimental measurement results taken from LAN, WAN, and Planet-Lab tests.
Computational grids provide computing power by sharing resources across administrative domains. T... more Computational grids provide computing power by sharing resources across administrative domains. This sharing, coupled with the need to execute untrusted code from arbitrary users, introduces security hazards. This paper addresses the security implications of making a computing resource available to untrusted applications via computational grids. It highlights the problems and limitations of current grid environments and proposes a technique that employs run-time monitoring and a restricted shell. The technique can be used for setting up an execution environment that supports the full legitimate use allowed by the security policy of a shared resource. Performance analysis shows up to 2.14 times execution overhead improvement for shell-based applications. The approach proves effective and provides a substrate for hybrid techniques that combine static and dynamic mechanisms to minimize monitoring overheads.
Profiling the execution phases of an application can lead to optimizing the utilization of the un... more Profiling the execution phases of an application can lead to optimizing the utilization of the underlying resources. This is the thrust of this paper, which presents a novel system-level application resource demand phase analysis and prediction prototype to support on-demand resource provisioning. The phase profile learned from historical runs is used to classify and predict phase behavior using a set of algorithms based on clustering. The process takes into consideration application's resource consumption patterns, pricing schedules defined by the resource provider, and penalties associated with service-level agreement (SLA) violations.
This paper elaborates on mechanisms by which users, data, and applications can be d e coupled f r... more This paper elaborates on mechanisms by which users, data, and applications can be d e coupled f r om individual computers and administrative domains. The mechanisms, which consist of logical user accounts and a virtual le system, introduce a layer of abstraction between the physical computing infrastructure and the virtual computational grid perceived by users. This abstraction converts compute servers into interchangeable parts, allowing a computational grid to assemble computing systems at run time without being limited by the traditional constraints associated with user accounts, le systems, and administrative domains. The described approach has already been deployed i n t h e context of PUNCH the Purdue University Network Computing Hubs, and is unique in its ability to integrate unmodi ed applications even commercial ones and existing computing infrastructure into a heterogeneous, wide-area network computing environment.
Virtual machines provide flexible, powerful execution environments for Grid computing, offering i... more Virtual machines provide flexible, powerful execution environments for Grid computing, offering isolation and security mechanisms complementary to operating systems, customization and encapsulation of entire application environments, and support for legacy applications. This paper describes a Grid service -VMPlant -that provides for automated configuration and creation of flexible VMs that, once configured to meet application needs, can then subsequently be copied ("cloned") and dynamically instantiated to provide homogeneous execution environments across distributed Grid resources. In combination with complementary middleware for user, data and resource management, the functionality enabled by VMPlant allows for problem-solving environments to deliver Grid applications to users with unprecedented flexibility. VMPlant supports a graph-based model for the definition of customized VM configuration actions; partial graph matching, VM state storage and "cloning" for efficient creation. This paper presents the VMPlant architecture, describes a prototype implementation of the service, and presents an analysis of its performance.
Computing Research Repository, 2008
This paper introduces Archer, a community-based computing infrastructure supporting computer arch... more This paper introduces Archer, a community-based computing infrastructure supporting computer architecture research and education. The Archer system builds on virtualization techniques to provide a collaborative environment that facilitates sharing of computational resources and data among users. It integrates batch scheduling middleware to deliver high-throughput computing services aggregated from resources distributed across wide-area networks and owned by different participating entities in a seamless manner. The paper discusses the motivations that have led to the design of Archer, describes its core middleware components, and presents an analysis of the functionality and performance of the first wide-area deployment of Archer running a representative computer architecture simulation workload.
We advocate a novel approach to grid computing that is based on a combination of "classic" operat... more We advocate a novel approach to grid computing that is based on a combination of "classic" operating system level virtual machines (VMs) and middleware mechanisms to manage VMs in a distributed environment. The abstraction is that of dynamically instantiated and mobile VMs that are a combination of traditional OS processes (the VM monitors) and files (the VM state). We give qualitative arguments that justify our approach in terms of security, isolation, customization, legacy support and resource control, and we show quantitative results that demonstrate the feasibility of our approach from a performance perspective. Finally, we describe the middleware challenges implied by the approach and an architecture for grid computing using virtual machines.
Cluster Computing, 2006
This paper presents a data management solution which allows fast Virtual Machine (VM) instantiati... more This paper presents a data management solution which allows fast Virtual Machine (VM) instantiation and efficient run-time execution to support VMs as execution environments in Grid computing. It is based on novel distributed file system virtualization techniques and is unique in that: (1) it provides on-demand cross-domain access to VM state for unmodified VM monitors; (2) it enables private file system channels for VM instantiation by secure tunneling and session-key based authentication; (3) it supports user-level and write-back disk caches, per-application caching policies and middleware-driven consistency models; and (4) it leverages application-specific meta-data associated with files to expedite data transfers. The paper reports on its performance in wide-area setups using VMware-based VMs. Results show that the solution delivers performance over 30% better than native NFS and with warm caches it can bring the application-perceived overheads below 10% compared to a local-disk setup. The solution also allows a VM with 1.6 GB virtual disk and 320 MB virtual memory to be cloned within 160 seconds for the first clone and within 25 seconds for subsequent clones.
Virtual Machines are becoming increasingly valuable to resource consolidation and management, pro... more Virtual Machines are becoming increasingly valuable to resource consolidation and management, providing efficient and secure resource containers, along with desired application execution environments. This paper focuses on the VM-based resource reservation problem, that is, the reservations of CPU, memory and network resources for individual VM instances, as well as for VM clusters. In particular, it considers the scenario where one or several physical servers need to be vacated to start a cluster of VMs for dedicated execution of parallel jobs. VMs provide a primitive for transparently vacating workloads through migration; however, the process of migrating several VMs can be timeconsuming and needs to be estimated. To achieve this goal, this paper seeks to provide a model that can characterize the VM migration process and predict its performance, based on a comprehensive experimental analysis. The results show that, given a certain VM's migration time, it is feasible to predict the time for a VM with other configurations, as well as the time for migrating a number of VMs. The paper also shows that migration of VMs in parallel results in shorter aggregate migration times, but with higher per-VM migration latencies. Experimental results also quantify the benefits of buffering the state of migrated VMs in main memory without committing to hard disks.
A key challenge faced by large-scale, distributed applications in Grid environments is efficient,... more A key challenge faced by large-scale, distributed applications in Grid environments is efficient, seamless data management. In particular, for applications that can benefit from access to data at variable granularities, data management can pose additional programming burdens to an application developer. This paper presents a case for the use of virtualized distributed file systems as a basis for data management for data-intensive, variable-granularity applications. The approach leverages on-demand transfer mechanisms of existing, de-facto network file system clients and servers that support transfers of partial data sets in an application-transparent fashion, and complement them with user-level performance and functionality enhancements such as caching and encrypted communication channels. The paper uses a nascent application from the medical imaging field (Light Scattering Spectroscopy -LSS) as a motivation for the approach, and as a basis for evaluating its performance. Results from performance experiments that consider the 16-processor parallel execution of LSS analysis and database generation programs show that, in the presence of data locality, a virtualized wide-area distributed file system setup and configured by Grid middleware can achieve performance levels close (13% overhead or less) to that of a local disk, and superior (up to 680% speedup) to non-virtualized distributed file systems.
Uploads
Papers by Renato Figueiredo