Academia.eduAcademia.edu

Technologies for Sharing and Collaborating on the Net

2001, Peer-to-Peer Computing, 2001. Proceedings. First …

This paper is an overview of the technologies that are neededfor Peer-to-Peer (P2P) computing. It represent a high level overview ofmaterial covered in much greater detail in [I]. P2P means different things to different people, and we start by defining P2P and categorizing applications that use P2P. Server-mediated P2P is included as a legitimate P2P architecture. The applications space is divided into three types: distributing computing, content sharing, and collaboration. A,jter a discussion of what is required from an infrastructure f o r P2P we describe some solutions that address a class of applications each, but also include a start towards a general-purpose architecture. The same basic services are needed by each platform and application, and all will benefit .from common middleware. This is one way, though not the only one, to achieve interoperabilitythat will be useful to users and application developers; in particular, when we reach the integrated applications stage. Regardless of how the infrastructure is constructed, the technical topics which need to be addressed are the same: communication through gatewa-vs and firewalls; naming and discovery ofpeers and resources when DNS is not enough; availability and persistence of resources in spite of intermittent connectivity and dvnamic presence; security in a distrusgul environment and locally managed systems: and, management of distributed resources in a heterogeneous setting. There are a number of interesting ways in which people have applied P2P technologies. We give an example in each ofthe three application categories. The.fina1 segment of the talk summarizes the challenges that need to be overcome f P 2 P is to be widely adopted. The major technical challenges are direct communication, security, naming and interoperability. There are also social and cultural barriers. These are the result of decentralization, and loss of direct control over resources. They also have to do with trust, reputation, and * Other names and brands may be claimed as the property of others.

Technologies for Sharing and Collaborating on the Net David Barkai Technology Research Labs, Intel Corporation [email protected] zyx zyxwvutsr zyxwvutsrq online communities. Nonetheless, there are some promising activities around the industry--working groups and initiatives, and a great opportunity. Abstract This paper is an overview of the technologies that are neededfor Peer-to-Peer (P2P) computing. It represent a high level overview ofmaterial covered in much greater detail in [I]. P2P means different things to different people, and we start by defining P2P and categorizing applications that use P2P. Server-mediated P2P is included as a legitimate P2P architecture. The applications space is divided into three types: distributing computing, content sharing, and collaboration. A,jter a discussion of what is required from an infrastructure f o r P2P we describe some solutions that address a class of applications each, but also include a start towards a general-purpose architecture. The same basic services are needed by each platform and application, and all will benefit .from common middleware. This is one way, though not the only one, to achieve interoperability - that will be useful to users and application developers; in particular, when we reach the integrated applications stage. Regardless of how the infrastructure is constructed, the technical topics which need to be addressed are the same: communication through gatewa-vs and firewalls; naming and discovery ofpeers and resources when DNS is not enough; availability and persistence of resources in spite of intermittent connectivity and dvnamic presence; security in a distrusgul environment and locally managed systems: and, management of distributed resources in a heterogeneous setting. There are a number of interesting ways in which people have applied P2P technologies. We give an example in each ofthe three application categories. The.fina1 segment of the talk summarizes the challenges that need to be overcome f P 2 P is to be widely adopted. The major technical challenges are direct communication, security, naming and interoperability. There are also social and cultural barriers. These are the result of decentralization, and loss of direct control over resources. They also have to do with trust, reputation, and 1. The scope of peer-to-peer computing Peer-to-peer (P2P) is not a new invention. It is based on technologies and techniques that have existed since the early days of the Internet. In this sense, the idea that nodes on the network communicate directly with each other, is just going back to those early days in the 70’s when the Internet was limited to researchers in a few selected laboratories. The term P2P was coined more as a way to say what P2P isn’t, then to clearly define P2P. P2P, as a computing model, is different from the clientserver model, though the two models can certainly coexist within a single usage model. zyxwvutsrqpo zyxwvutsrqp zyxwvutsrqpo 1.1 What is P2P P2P computing, for the purpose of this paper, is a computing model. We will discuss later what kinds of computing, or processing, can be done using this model. So far, there is no agreement on a succinct definition of P2P. People also disagree on the scope of applications that may be ‘‘true” P2P applications. Because this new computing model is rapidly evolving, it is healthy and desirable not to be locked down by a rigid definition. The introduction of any new technology introduces surprises, and we need to accommodate them. Instead of a formal definition it is useful to articulate the basic elements that are characteristic of P2P computing: The “action” - computing, file sharing, communication, etc., is taking place at the edge of the Net, where the users and their devices are. Resources are being shared. They might be computing cycles, storage, network bandwidth, content, and more. * Other names and brands may be claimed as the property of others. 0-7695-1503-7102 $17.00 0 2002 IEEE 13 zyxwvutsr zyxwvutsrq Collaboration involves humans. In these applications people interact with each other in a near real-time manner. In this category we find instant messaging, audio and visual communications, concurrent updates to shared objects (documents), and so on. Direct communication between peers - PCs or other devices at the edge of the Net, is almost always present. Though, it seems that direct communication between peers is the first requirement of P2P, there is an application that presents a notable exception. Distributed computing, as done by SETI@home for example, involves no communication between peers whatsoever. We would still like to include it in the P2P space, since PCs offer service and share a resource with others. Each of the three P2P application categories has its own requirements for features and capabilities. They share different resources and operate in different environments. We will see that some fundamental services are needed for any P2P application, though not always in the same manner. Another approach that avoids the need for a formal definition of P2P is the following simple litmus test: When your application utilizes resources on other systems that are located at the outer edges of the network, then you are performing P2P computing. This will also entails using specific P2P services for operations such as naming, direct exchanges, resource management, etc. zyxwvut zyxwvutsrq zyxwv 2. Architecture The architecture we refer to here is that of the infrastructure of the P2P computing environment. There are several advantages to having a common P2P middleware, though it is not clear such will emerge. A common set of services will offer interoperability (but there are other ways to achieve interoperability). The main advantage of a common, and open, middleware is that application developers do not have to create their own basic services. They can focus on the unique features and capabilities of their own application. 1.2 P2P applications There are many types and variants of applications that employ P2P technologies, and some examples are presented in Section 4 of this paper. Here, I describe my choice for categorizing such applications. This is needed in order to provide context for the architecture discussion that follows. Any P2P application that I can think of falls into one of these three categories. Distributed Computing Content Sharing Collaboration Unfortunately, there is no such common infrastructure at present. There is no single implementation of an architecture that developers of P2P applications rally behind. Some application developers created the P2P services needed for their application. Others, however, set out to create an infrastructure, or a platform, on which P2P can be developed. So far, all of these platform developers address a specific type of P2P application. Essentially, they offer a partial solution to the general problem of a general-purpose P2P middleware. Most, but not all, will belong to exactly one category. In distributed computing the compute power, or ‘cycles’ is the resource that is being shared. Direct exchanges between peers are, typically, limited or non-existent. The shared computation is, generally, managed by a central system. Some of the larger computations are being carried on PCs scattered throughout the Internet. But, more and more, we see businesses using idle cycles of desktops within the enterprise local network. This section presents infrastructure designs that target the specific areas of applications described in section 1.2. Each of the classes of application has more than one solution, though only one is given here (others are described in [I]). There is no single “right” architecture, not even for any one segment of the P2P computing space. It is also true that no one solution is a complete and finished product. The new era of P2P computing is too young for definitive products. The platforms described here are still evolving. The content sharing category includes several different application types. The most common one is file sharing-a term that really means passing on a copy of a file. Content can be shared through access to files on other systems. Storage is shared when jointly-owned files (documents) reside on a peer system. File distribution is another form of content sharing. P2P content distribution will employ peers to decentralize the distribution process. We can think of the infrastructure for P2P as a middleware layer. This layer sits on top of the local operating system, which, in turn interfaces to the underlying hardware platform. Above the P2P middleware there 14 zyxwvutsrq To create viable P2P applications, we need to resolve several fimdamental issues. These are the significant requirements for general-purpose middleware: might be a P2P application interface layer, on top of which are the applications themselves. 2.1 Basic requirements Cross platform. This is a multiple facets requirement: device, hardware, and software. In support of heterogeneity, the P2P infrastructure must provide a consistent set of middleware services on a multitude of devices. These devices range from a handheld wireless device or an appliance, to a laptop and a PC, to a server farm or supercomputer. What are the realities and fundamental challenges that must be overcome to make meaningful P2P applications possible? Here are some of the important ones: Lack of trust: Sharing of resources and access must be granted between peers who are unknown to each other. Hardware heterogeneity: The whole environment is made up of heterogeneous components. The processors can be from different architectures or generations; with varying sizes of memory, and different peripherals. The computers may range from handheld devices to a supercomputer. Softwace heterogeneity: P2P applications are expectedl to incorporate peers running on any of the commercially available operating systems. Network heterogeneity: The networks connecting the peers differ in bandwidth and latency by orders of magnitude. Scale: The number of peers, or nodes, participating in the application can be as few as two or in the millions, and any number in between. Intermittency: The systems, and configuration, are not fixed and static. Nodes may come and go, not as defined by the P2P application, but due to external interests. The presence of peers is assumed to be intermittent. Location: Resources, such as files, storage, contributing: peers, or executing programs, may move around during the session. Autonomy: Peers agree to share resources and collaborate while maintaining autonomy and being a final authority over their local system. Local, policies: The application usually spans nodes with different security measures and administrative policies. At times collaborative peers belong to organizations that want to protect information from each other. Distance: The peers are expected to be geographically separated. An implementation that does not comply with the full range of platforms limits the environments where it is applicable. Interoperability. At the very basic level, a P2P application must interoperate between, at least, two computers. But this is not enough. Once the P2P middleware runs on multiple hardware and software platforms, these different peers (and servers) must cooperate in running the same application. In fact, when the reservoir of resources for the application is out there all over the net, it is inconceivable that any popular application not support interoperation between peers. Flexibility in choosing APIs contributes to the need for another aspect of interoperability. The use of shared middleware can provide a mechanism for integrated applications. Interoperability may also provide the desirable feature of migrating a P2P application from one peer to another. Security. To make it possible and safe for users to join such a P2P application, a layer of security needs to envelop the exchanges. The security requirement covers several functions: Authentication for validating visitors to your system, authorization for allowing access to your computer, and measures for assuring data integrity, confidentiality, and privacy. Security is fundamental to establishing trust between peers. Security affects any communication and access the application performs. Also, security mechanisms and policies must not overwrite and compromise those policies that exist on each peer computer. Local autonomy. Any service offered by the P2P middleware has to respect the autonomy of the peer. The user determines how, when, and to what extent resources are available to P2P applications. The user also determines who can access the peer computer. There are times when the conditions that are set render the peer not suitable for the P2P application. The services need a way to check and handle this finding. There are environments where some of these issues don’t come into play. A P2P application running on dis,persed resources within a single facility of an enterprise, for example, does not face a number of the challenges i listed above. But, in general, the P2P infrastructure must be resilient in the face of these realities. zyx 15 zyxwvutsrqpo zyx Persistence. One serious difficulty with P2P computing is the intermittent presence of peers. The services must assume that peers may not be available until a task or a session is done. Yet, the application is expected to continue and complete reliably. To deal with intermittent peer presence we need a transparent migration or replication of resources. When a peer disconnects from the application another one may or may not join. It is essential that data that was on the disconnecting peer be available. The capability to move or recreate that data makes it persistent as far as the application is concerned. Sharabla Resources Naming, Disoovery, Directory Administration, Monitoring .(D Identity, Presence, Community Q Security Scalability. The scope of P2P applications is spread across the Internet. The number of possible peers continues to grow. The design of the architecture must allow for scaling up. All the services must be designed so the load on any one node does not increase indefinitely with the number of participating peers. IY I I /I Figure 1. Components of a middleware layer 2.2 Infrastructures by application category Extensibility. High-volume P2P is still at its infancy. New devices will be introduced, and new features will be added in existing devices. We can’t tell what new services will be needed in the future, nor what features will prove missing in near-term services. A few years from now we may refer to the middleware described here as merely “first generation P2P.” It is for these reasons that the architecture and the middleware implementation must be modular and extensible. zyxwvu zyxwvutsrqp zyxwvutsrq In this section we look at a representative example from each of the three application categories: Distributed computing. This category focuses on applications where peers cooperate in solving computational problems. The cooperation is also referred to as cycle-sharing. Grid computing is closely related to P2P cycle-sharing, but its origins are different. Though cooperative computation is the impetus for solutions in this space, you will see examples of ideas and services that have far wider applicability. Content sharing. This category includes P2P platforms that offer tools and services for applications that enable sharing and delivering of content, and sharing and distributing storage. Collaboration. This category includes platforms that support applications where peer collaborate online. What differentiates collaboration from the two other classes is the need for synchronous presence of the peers. The peers work together on some shared resource, such as a document. Or, they engage in a real-time exchange. There are other attributes of the infrastructure without which P2P adoption will be seriously hampered. They include Fault-tolerance, Transparency, Reliability, Small footprint, Performance, and Development tools. In summary, Figure 1 below depicts one way to stack the elements that the middleware services need to address. This is not a diagram of a proposed set of services, only a convenient way to list areas the middleware needs to handle. Once an architecture is defined it may be appropriate to have more than one service for some of the layers. For example, the “Naming, Discovery, Directory” layer may be implemented as three or more separate services. Similarly, the architectural solution may dictate that a layer may not have a single service associated with it at all. Security and Monitoring, for instance, may have functions associated with them integrated into other services as needed. 2.2.1 Distributed Computing In a P2P infrastructure, the distributed computing category includes the tools and services needed to manage resources shared across peers. The activities needed for cooperative computation include resource discovery, requesting and granting access to peer systems, naming of objects and entities, assuring data integrity, and dealing with intermittent presence of resources. When these tools are available, data can be distributed for 16 storage or delivery. We can expect commonality in services between those provided for distributed computing and those that are content-focused. Indeed, there are applications that incorporate elements from both cycle-sharing and content-sharing. Note that a P2P search involves gaining access to peer computers, performing processing to look for items specified by the search criteria, and sending results back. The results in this case are synthesized content. I zyxwvuts zyxwvutsrqp MP sewer I I UGAoent zyxwvutsrqpo We can see that the basic services needed for different types of P2P applications do have a lot in common. Figure 2. Distributed computing architecture from United Devices* Distributed computing comes in a number of flavors, depending on the application and the context in which it is run. The more common, and practical, network-based computations are those where the whole application runs on each peer, and there is no need for any communication between the peers. For example, if there are many sets of data that are to be processed independent of each other, each peer can get one set, process it, then get another one, and so on. SETI@home is such an application. Another type of computation, called a parametric study computation, occurs when the same computation is performed for each of a predefined set of values. Think of a fluid dynamics simulation of an aircraft that has to be checked against every possible angle of attack. Each peer receives the whole application and a value of the angle to which it applies the computation. Distributed computing involves using PCs on the network to participate in a computation. These PCs may be in the Internet or within an intranet - a local network of an enterprise. Or, the computations may be performed by PCs in both environments. Grid computing used to be another form of such distributed computation - one where a number of data centers, with multiple systems, collaborate in a large scale computation. The distinction between network distributed computing and Grid computing are getting blurred as this discipline evolves. zyxwvutsrq 2.2.2 Content Sharing The term content encapsulates messages and files of all types-text, music, graphics, video, source and binary software, and any other content that can be digitized. Content sharing encompasses several types of activities: In figure 2 we see how United Devices* designed their computing network for distributed computations. The platform is called the MetaProcessor* and is managed by the MP Server. The platform consists of a specialized scheduling and distribution server; a database storing and tracking all work performed by devices on the network; and the UD Agent, which resides on each desktop and workstation on the network. The UD Agent is a small software program that runs on the peer. The UD Agent enables the MetaProcessor platform to work with the participating device for downloading applications, processing work units, and returning results and device information. The web console, also on the peer system, is the user interface for managing the platform, testing, scheduling, receiving new data, and sending results. The other components of the architecture are a central MetaProcessor (MP) server, and the database attached to it. The server manages intelligent scheduling, secure communications, workload distribution, collection of statistics, and other infrastructure management functions. The database contains information about both the application and its work units, and about the users and their UD Agents (see also [ 2 ] ) . Messaging, file delivery, also known as file sharing, distributed storage and distributed shared file systems, caching and edge services, content distribution, knowledge management, and content management. What does this mean in terms of the requirements placed on the infrastructure for content sharing? Content sharing applications require all of the services that are needed for distributed computing, with the exception of managing CPU cycles. The services also become more complex. Every peer must establish, with confidence, the identity of everyone else. Unified names are needed not just for the computer systems and the users, but for all files. Presence of peers, in the content sharing context, involves not just availability of computational results, but availability of files and storage locations as well. Finding and managing content involves more complex and ever-present discovery and monitoring capabilities. And, whereas in distributed computing all the participating peers make up a single zyxwvutsrqpo zy 17 zyxwv zyx zyxw zyxwvutsrq group, content sharing environments need to manage multiple groups, or communities. SPartnl .8WID.S Party Portal I Third Applications There are a number of content sharing P2P implementations, and they share some basic functionality but differ in approach. Let’s look at an example from Nextpage*. The technical issues addressed by NextPage include discovery and search techniques, authentication and trust, persistence of content and its relevance. < ;-, The heart of the NextPage framework is the NXT* eContent Platform. This consists of several modules that provide the different features and services of the platform, as illustrated in Figure 3. We look here at the NXT 3 version (a more recent product will be launched by the time this paper is printed). Let’s look at the main modules of NXT 3. The Content Server is the piece that aggregates and maintains content from various data sources on the Internet, intranet, or extranet. The next piece is the Content Syndicator. The syndication model enables real-time access to distributed information through a protocol that allows users to search all available information in the network. The Content Syndicator works with the Content Adapters, that provide real-time access to disparate types of data. Finding the content in the first place is the responsibility of the search engine. It provides distributed search functionality that spans the NextPage network of peers. The user can provide feedback on relevance of the results for further refinement. The final component of the Content Platform provides Security Services. NXT Security Services has three parts: authentication, authorization, and metering. Metering is the process whereby the user is further authorized based on some usage-based criteria such as number of page views, users concurrency, or resource usage. ~ I. I ;? u u umrn *ab- URL Flb ONtZtOl Figure 3. The NextPage architecture for content sharing 2.2.3 Collaboration Frameworks The collaborative class of P2P applications is characterized by ongoing interactions and exchanges between peers. A collaborative application may involve several users reviewing and modifying a file simultaneously. It assumes an immediate response. The collaborative applications also must respond in real-time. The immediacy of the interactions exposes the environment to possible human error and misjudgment. Thus, assuring security and integrity of content becomes a greater challenge, compared with the two other classes we looked at before. This area of direct collaboration between peers is where new applications, not yet conceived, may appear. Where distributed computing and content sharing make great use of the resource sharing capability of the P2P computing model, collaborative applications add strong emphasis to taking advantage of the direct exchange feature enabled by the P2P model. Direct exchanges provide the immediacy that cultivates person-to-person interactions, that are the essence of collaboration. zyxwvutsrqpo Let us now look at one example of a P2P collaborative platform - this one is from Endeavors*. This type of infrastructure provides for interaction between humans and not just resource sharing. Direct exchanges between people are central to this platform. The Application Framework resides above the content platform. The framework provides a means for incorporating existing or new applications into the services and capabilities embedded in the NXT content platform. You can read more about NextPage technologies in the Products tab in [3]. 18 paper describes only Avaki. Avaki, the commercial successor to Legion, is built as a complete system with inter-related services and capabilities. It is a bottom-up, integrated solution. The Globus project provides a set of tools, based on common and standard protocols, that work on multiple platforms. Think of this approach as a top-down design. The necessary services are provided, and application developers decide how to couple them and which ones to use (for more on Grid computing and P2P see [6]). zyxwvutsrq Describing the Avaki approach to P2P infrastructure is an appropriate conclusion to this short survey of P2P platform solutions. OWZIIY Figure.4. Architecture of the Endeavors Magi “canonical peer” Figure 4 describes the architecture of the node, or what Endeavors calls the “canonical peer.” The presence of the web server on the client side, or the edge of the net, creates the client-server combination that, in Magi parlance, is called the “Magi servlet.” zyxwvuts zyxwvu Awlidon *WCM The P2P infrastructure is run through the servlet engine. These two layers comprise the “web server foundation.” Above the foundation are the core components: the Request Manager that directs all requests to the appropriate service, the Event Service that handles the communications between peers, the Access Controller that manages the access by other peers, and the Module Container that dynamically loads other services and provides access to APls used by applications built upon the core services. File Sharing Policy Managemern sysiem Blanegernwrl JobSxheduling Distributed FiteSvsternI I Job Distnbutan Remon Monioring Laad Balancing RmV9ry zyxwvuts A> -- Qrrvirrs I MeterinarPlocountina I P r o g m Registration FailOver I Utilization Management I Dstnbutd Directory S e w m Authentrarion. Encrypan. Accss Conral Pmmml Maptws The vertical columns in Figure 4 are examples of services that can be built with the help of the core components. With the inclusion of APIs these tools are a part of applications that the end user can invoke. The services, at that level, offer such capabilities as chat, meta search through WebDAV, presence information, web interfaces, and more. The user model that fosters the collaborative aspect of Magi is based on “buddy” lists and groups. See [4] and [ I ] for more details. OMiZlM Figure 5. Avaki’s distributed middleware architecture The middleware platform is organized as three separate layers: a Core Services layer, a Systems Management Services layer, and an Applications Services layer. The Core Services layer provides a set of primitives for building P2P and distributed computing applications: Protocol adapters enable the middleware to work seamlessly over a wide variety of different networking environments. This service layer also includes the interfaces for interoperability with platforms such as .NET from Microsoft and JXTA from Sun Microsystems. The security capabilities offer public key authentication, SSL data encryption, and group-based access control. The Distributed Directory Services provides a secure three-tier naming system that is used for each 2.3 Towards general-purpose architectures There is no comprehensive infrastructure that encompasses all the usage models and application types that we associate with peer computing. That said, there are architectures that attempt to provide a more generic, or general-purpose, P2P platform. Two architectures worth mentioning are the middleware from Avaki*, previously known as Applied Meta, and Globus* from the Global Grid Forum community. They have common historical roots, and though different, this 19 placed by the server-based network for finding the IP addresses of devices at the edge of networks. The solutions are vaned, but all use TCP or UDP over IP, though some get to these transport protocols via HTTP. resource. The three tiers are: a human-readable name, a location-independent identifier, and a physical address. Each location-independent identifier includes an embedded public key. Metadata repositories and resource discovery services enable users and applications to find files or other resources. Event notification service enables publish and subscribe messaging between networked resources. There are two main issues related to communication: First, the use of dynamic or private IP addresses makes it difficult to establish direct communication with another peer. It makes it difficult for a peer to offer a service, a capability central to P2P computing. Network Address Translators (NATs) are used for traversing between the publicly known IP addresses and those that are private or unknown externally. zyxwvuts zyxw zyxwvutsrqponmlk The System Management Services layer includes the services that help organizations implement and manage distributed resources and solutions. The purpose of each of the services is identified in Figure 5. The Application Services layer provides higher level services to construct P2P applications. The Distributed File System enables the virtual collection of storage across a wide variety of platforms and devices. File Sharing provides capabilities for access control to files that includes strong authentication. Job Scheduling is priority-based, and is used for distributed computations. Job Distribution services automate the process of moving application binaries to different hardware platforms. Second, most networks employ firewalls, for security reasons, that impede direct communication by filtering packets and limiting the port numbers open to bidirectional traffic. A general case solution to traversing NATs and firewalls is eluding us, so far. The problem has multiple faces, as illustrated in figure 6, and applications have multitude of requirements. Singk NAT You can find out more technical information in [5] and [I]. Double NAT 3. Technical issues of P2P This section provides a high level overview of the main technical topics that need to be addressed by any infrastructure for P2P applications. It is convenient to group these topics into these five areas: Communication Naming and Discovery Availability Security Resource Management zyxwvut zyxwvut zyxwv Nested NATs I Private = Private IP address Public = Pubhc IP address I CNlZ*I Figure 6. NAT network topologies A transparent and general solution for P2P connectivity requires agreement on gateway conventions (NATs, firewalls) that may become standards. Most P2P applications need access to the same fundamental services that support the issues listed above. The following subsections offer more details on each of these items. IPv6, the new standard for IP addresses, by itself will not be the solution. Dynamic and private IP addresses will continue to be present on the Internet. 3.1 Communication The Internet has developed into a non-P2P network, in the sense that most exchanges rely on mediation through gateways and servers. All P2P platforms need to create communication schemes that overcome the obstacles The common means for traversing the gateways involve a third party. The solutions work well, but scaling them to many thousands or millions of users may be difficult. 20 zyxwvutsrq zyxwvutsrqp Presence. Who is connected? Who is a member (even if not connected)? Content. What are the topics discussed by this group? There is a noticeable convergent to a set of protocols. The common. denominator is HTTP. It provides the interface to calls to TCP and UDP. The trend is to use XML, with more and more SOAP included, running on top of HTTP. In particular, resource discovery is fundamental to P2P. The resources peers are looking for vary by the application, the group, or the community to which they belong. 3.2 Naming and Discovery The Internet is not a static entity. Communication messages are going from place to place as content migrates through distribution and downloading. Content can be reproduced through replication and caching. Computers, or nodes on the net, are connected one moment and gone in next. Users move around between a desktop system, a mobile laptop, or a wireless device. And with all this activity, the P2P services need to keep track of where everything is, how to find it, and what to call it. Why do we need search capabilities that are specific to P2P? The current search method we all use employs a massive collection of servers that store all the information they have indexed from all over the Internet. These static engines that send crawlers do not revise data fast enough, and, more importantly, don’t even reach the peer systems. The well-known search engines search web servers. They are not likely to poke around our desktops (nor are they permitted to). First, there is the namespace issue. The traditional tool for naming and namespaces on the Internet, the domain name system (DNS), is not adequate for P2P applications. If you get assigned an IP address dynamically, it might take days before your name-address mapping is known across the vast array of DNS servers. It simply does not work when a peer is looking for another peer. Even when we have a way of finding a peer’s current address, and a method to start the communication between the two peers, the DNS is still not the right tool for mapping names to addresses. DNS is not responsive enough to the dynamic and intermittent presence of peers engaged in P2P computing. But no standard exists (other than DNS), and so developers came up with their own (see [ I ] for some examples). The use of XML for describing namespaces and names appears to be a common thread in current solutions. Perhaps, the trend will lead to a level of standardization in P2P naming schemes. So, one reason for P2P search techniques is that peers need to search within the P2P network. Peer network and P2P search can also assist the traditional search engines. The peers are able to find information the large search engines’ crawlers cannot reach. There are two aspects of searching where P2P can be useful. First, peers may conduct the search itself. There are circumstances when this method is more accurate and practical. Second, the servers containing the search engine database can communicate as peers. The database is distributed and replicated in a way that will eliminate the single-point-of-failure risk. zyxwvutsrqpo zyxwvu zyxwvutsrq There are well accepted protocols in existence that can be used for P2P-style discovery and registry. UDDl (Universal Description, Discovery and Integration), LDAP (Lightweight Directory Access Protocol), and even WebDAV (Web Distributed Authoring and Versioning) are all used by P2P platforms today. Next, we need to find out about things. P2P frameworks need to search, discover, and register findings in directo- ries. Most significantly, peer computing opens up new opportunities in which to enhance the scope of discovery and meaningful searches on the web. For example, P2P-based search can go beyond searching web pages only - it can get to other types of shared files. Search can be bounded to communities of shared interest. And, within a P2P community, it may be easier to conduct searched by context. Discovery, in the P2P context, includes the protocols and the mechanisms by which peers and P2P applications become aware of the resources and services they need. Directories are where the findings of the discovery are placed. What are the things that need to be discovered in a P2P environment? Resources. What i s out there that my application needs? 21 3.3 Availability zyxwvutsrqp And even though a P2P network is a fairly loose organization, the issues of fault-tolerance and availability require a system-wide view. The interplay between the peers’ local autonomy and the desire to offer a reliable setting for applications will continue to be challenging. And it will certainly require more attention. One of the least glamorous aspects of software projects is maintenance: what needs to be in place and the tasks that need to be done to keep things going. This is particularly important for P2P computing, given the premise of the concept. 3.4 Security P2P networks harness resources at the edges of the net, relying on components over which the application has little or no control. The participants exercise local autonomy over their own systems, and in many circumstances the peers join and leave the network at will. Communications are carried over external networks, which are sometimes unreliable. Often, the systems on which computations are run are unknown. Content that is found may disappear or move the next time it is sought. And the list goes on. zyxwv zyxw zyxwvutsrqpo Security issues and lack of trust are the highest ranked concerns with regard to P2P computing. Opening up your system for access by others can be disconcerting. The security that users demand boils down to these statements: 0 I need to be sure that the person I am communicating with is who he or she claims to be. 0 I need to control when and how people access resources on my system. 0 1 need to have confidence that my messages are not read or modified en route. So, the issues are those of intermittent connectivity, availability of resources, and fault-tolerance in the face of a very dynamic setting. More specifically, the P2P infrastructure, or the P2P application, needs to address and resolve Managing intermittent presence Fault-tolerance Maintaining content coherency Availability of resources Availability of content Fault-tolerant use of servers System Management Security has become a major concern on the Internet, with or without P2P. Users are connected for long periods of time -- some are always connected, as is common with users of broadband. This makes them more vulnerable to attacks. Web browsers make it very easy to download and execute applications that cannot be trusted. Online commerce involves transmissions of confidential and private information. Peer-to-peer computing adds other risk factors: Peers act as servers, and their systems offer services. They are available to requests made by other entities, which aren’t always known. Peers offer resources on their systems for use by others. They need to shield that which isn’t shared. They need to manage the level and mode of access. Most of the owners of the peer systems have no prior experience in managing a server or a service. Servers on the web have full-time administrators who monitor security advisories, so they can immediately patch their servers when new vulnerabilities are discovered. The manner in which solution providers address the issue of changing presence and availability is more dependent on the application’s setting than on the application itself. It is prudent to take preventive action. Content is being replicated and cached. And a number of solutions establish a third-party presence, which will always be connected, and where messages can be queued. Most P2P networks, if not all of them, share the issue of keeping content persistent. They share the issue of gracefully continuing when peers disconnect, and updating their states when they rejoin. It is interesting to note that a few solutions to the intermittent presence of peers have led to setups that improve performance. When data is replicated, network traffic can be further optimized by selecting alternate routes. Multiple copies allow for load balancing of processing and communications. The P2P infrastructure or application must address these issues to create a secure computing environment for the participating peers. The potential risk is very real--here are some of the actions that can occur around P2P: Direct communications that include general file sharing may result in peers installing new applications on their host machines. When you get an ap- 22 zy plication from another domain, you don’t know what side.effects this new application may have on your system, nor whether it may damage your local network. P2P communication schemes often provide a way to tunnel through a firewall. Once a port is open to direct communication, the content that’s passing through may not be monitored by the organization’s firewall software. Without carefully constructed safeguards, you risk exposing content and files to others outside your local network and external to your organization. When you request and download a file from another peer, you often do it based on blind faith -- you are simply assuming that the file that you receive really is thefile you want. Some. of the popular file-swapping networks don’t offer sufficient protection against “guests” snooping aroundyour system. When someone can access your browser’s cookies, then they can steal your online identity. Instant messaging applications are, in general, an insecure communication mechanism. Despite this risk, people often use IM to exchange sensitive information. Of course, the same security risk exists in our regular email system, but IM gives us the illusion the exchange is a private conversation. Do not expose information on identities and data of others to unauthorized users. Delegation of tasks and authority has to respect the rules set by the owners of the identities involved. Secure data has to be protected on-the-wire if sent to another location. Details of security solutions are outside the scope of this paper (but see chapter 9 in [I]). We can state, however, that solutions do exist. There are well understood procedures for two-part and mutual authentication. There are implementations for authorization followed by mechanisms for access control, sometimes with sandboxing. And there are readily available processes for data encryption, digital signatures, and techniques for validation of computations. zyxwvutsr zyx What we do not have is an agreed upon standard method for each of the security-related operations. 3.5 Resource Management Peer-to-peer computing is about two things - direct communications between peers, and sharing resources at the edges of the net. This is a very brief overview about managing these resources, which are scattered all over, come in many shapes and forms, and are owned by thousands of different individuals and institutions. zyxwvu These scenarios, among others, raise concerns and alarm within the IT ranks, and justifiably so. The security risks should be a concern to the home user, too, and for the same reasons. What are these resources that we can share through the magic of P2P technologies? Computational power. This is the resource that is central to distributed computing. It is also referred to as cycle sharing. Storage and files. Peers store files that are shared with others -- envision an application where unused storage is used to store files that are not shared with the owner of host system. An IT organization may use such space for backup or the replication of files belonging to members of the organization. Network bandwidth. P2P infrastructure can be applied to route network traffic of messages and file transfers. The peers then share the bandwidth of connections emanating from their systems. So, what are the basic security features that will make us feel safe, in the P2P world? The bare minimum a peer needs is:. Authentication: Be certain of the identity of the peer communicating with you Authorization: Control what others can do on your system Data Integrity: Have confidence that data you send or receive was not tampered with Along with the technical aspects listed above, there are social factors that can add to, or detract from, our sense of security. Trust and reputation are central to building online relationships, just as they are in our “real” world. What is expected of a P2P security solution? Integrate with local security measures, and adapt to local policies. Do not compromise what’s there. Provide a framework for access control. This is the utility through which authorization policies are exercised. These three items represent shared hardware elements. But there are two other resources that are shared in the P2P world. They are: Content. Content uses storage. But sharing storage does not necessarily imply that the content of these files is shared although that might be the case. Similarly, you can share content with your peers in 23 zyxwvutsrqpo zyxwvuts plications people are developing for use in P2P computing; this is certainly not meant to be an in-depth survey of P2P applications. a way that does not involve sharing storage. The shared content may be somewhere on a server, or it might be moving between locations. The item that is shared is the content, not the physical medium. People. Well, we don't share people. But the direct exchanges and real-time interaction with people in a collaborative setting is a shared resource. 4.1 Interactive Collaborations Collaboration happens between people. At least, this is how we differentiate the P2P collaborative application from others. But, collaborative applications may well include sharing of content and (perhaps) joint computations, too. Not all P2P implementations deal with every resource type listed above. Some focus entirely on cycle sharing, with the run-time files needed, of course. Others focus on content sharing and manage file sharing. And those solutions that address all the aspects of P2P need to create mechanisms for managing all the resource types. P2P collaborations include application as diverse as these: Creation of online communities Development projects Gaming communities Computing communities Joint virus protection There are a few steps in using resources in P2P computing that are independent of the type of resource: The peer or the application needs to be able to discover the resource. The discovery includes learning about the resource's state and structure. The P2P application needs to establish access to the resource. This includes ensuring it can abide by the resource's local policies. And, for some applications, scheduling is also necessary. The host where the resource resides has the ability to authenticate the requesting entity. And it can enforce its local policies on the application that asks for use of the resource. The first item is more general in nature compared to the others. The latter types of communities are more specific. Here are some examples: Firstpeer* (http://www.firstpeer.cod) created an online marketplace that enables sellers, suppliers, and vendors to be integrated into the Firstpeer framework with their existing technologies and business models. The application employs a P2P architecture to distribute the marketplace hnctions to its participants, eliminating the need to replace the tools they currently use. 0 One of the early corporations to try the Groove collaborative platform is GlaxoSmithKline, a pharmaceutical company. They started with a pilot program, and asked their employees to suggest ways in which Groove could be used for online collaboration. The range of ideas and opportunities is impressive. Staff members from the Legal department wanted to review and edit documents with other parties. A development team suggested a test of new software. Scientists and Researchers wished to exchange information with external colleagues. The Operations department wished to share documents with contractors and suppliers. Oculus Technologies Corporation focuses on integration between applications. The Ford Motor Company has plans for COT", the Oculus application. The Ford team will utilize Oculus CO to connect members of a geographically-dispersed design team working with different software applications on different operating systems. The company chose Oculus CO because the P2P collaboration promises to improve the design process. It will allow zyxwvutsr Resource management 'is an area where each implementation does something different. The tools for managing the resources are specific to the application and the settings in which it runs. This is certainly an area where more work can be done to find commonalities, and perhaps recycle existing capabilities. 4. Success Stories We cannot do justice, in the space we have here, to the wealth of possibilities and actual application of P2P technologies to real-life cases. All I hope to achieve here is to provide a taste of some of the innovative usages of P2P computing. The P2P architectures described here are grouped into three categories (in addition to general-purpose ones): Collaborative Content sharing Distributed computing The same categories are applied here to examples of applications and usages of P2P technologies. The purpose of this section is to give an impression of the kind of ap- 24 zy pieces of a requested file. A prime example of Swarmcast in action occurred when Mandrake 8 was recently released. The Mandrake community became aware of the beta Swarmcast demo and people started downloading with Swarmcast. There were between 20 and 60 people in the mesh of the files at all times. That means 20 to 60 computers available to serve the next download request simultaneously. In a three-day period, Swarmcast helped serve up over 200 Mandrake images, which is over 130 GB of data! Ford engineers to evaluate many more design iterations in a much shorter period of time. Communities of collaborators have been formed around a variety of Grid computing projects. They are numerous and, in addition to computations, also include managing and analyzing data from highenergy physics, earth observations, and biology applications; virtual laboratory for earthquake engineering; communities of developers of software tools and technologies; communities for development of application for nuclear and for high-energy physics; data analysis from astronomical observations; and many other grid-related communities. Another kind of a community can be set up for protection against digital virus attacks. A coalition of software agents at the Sandia National Laboratory successfully protected five network-linked computers over two full working days of concentrated attack by a four-person hacker force. zyxwv zyxwvutsrqp Mangosoft* and BlueFalcon* have products used for sharing files by taking advantage of peer communities to utilize bandwidth and replication for efficient delivery. And this is only the tip of the iceberg. 4.3 Real Computations And there are more such fascinating examples (see [l], for example). Distributed computing is a very active area in P2P computing. There are quite a number of companies that are engaged in some form of distributed computations. Here are examples in three flavors of computations: Distributed computing on the Internet Distributed computing within the intranet Grid computing applications 4.2 Relevant Content Sharing content in the P2P context deals with the techniques and technologies for effective ways to discover content’s existence, its locations, and to get it delivered. When it comes to usages and applications that rely on content sharing, you cannot avoid getting involved with information- and knowledge-management. The filtering of data so that only meaningful data comes to you, and associating it with related information, becomes an integral part of the application. The power of direct exchanges and discovery searches between peers can be used to enhance the effectiveness of content management. Here are a couple of examples. Two examples of large scale computations on the Internet: Intel and United Devices joined forces with the American Cancer Society, the National Foundation for Cancer Research (NFCR), and the University of Oxford to advance cancer research with the help of distributed Internet computing. The first application involves leukemia drug-optimization research. Researchers at the University of Oxford are already using findings from early results in the project; they are refining later parts of the study and suggesting further experiments. With the overwhelming member response to this project, the University of Oxford is in the fortunate position of being able to expand the scope of their research project. They are taking the opportunity to further investigate promising discoveries almost as soon as they are uncovered in the course of this virtual drug-screening project. zyxwvutsrqpo Baker & McKenzie chose NXT from NextPage for nearly 2,800 of its attorneys. Attorneys at Baker & McKenzie can use NextPage to search throughout the firm‘s network for a document containing a certain word or name. With peer-to-peer technology, computers can share information without going through a central repository. This helps in getting most recent information, and reduces the burden of keeping a central repository. Baker & McKenzie attorneys using NextPage will have access to all relevant documents residing on the firm’s computers around the world, as well as those of select customers. Opencola* has a product called Swarmcast* that makes use of network bandwidth. It uses many peers to deliver zyxwvut Another distributed computing project that is looking for a cure is a joint cause-computing project between Entropia and the Olson Laboratory at the Scripps Research Institute, called “Fight AIDS at Home”. 25 zyxwvutsrq zyxwv Here are some details reported by the researchers of the Olson Lab at the Scripps Research Institute in a recent volume of the FightAIDSatHome members newsletter: At the end of July 2001 there were 32,000 machines computing on behalf of the FightAIDSatHome project, with a combined power of almost 15 Tera-Flops. This virtual supercomputer has about 4.5 million MB of RAM and about 50 million MB of disk space. FightAlDSatHome has already clocked up a staggering total of about 2,000 years of computation, as measured by what a fast workstation can compute in a year, making it probably the biggest anti-HIV drug-design calculation performed in history. 5. Challenges facing P2P The preceding sections provided an overview of the technical issues facing wide spread adoption of P2P. There are challenges, and there are some impressive successes. There are certainly many opportunities. 5.1 Technical challenges First, the technical challenges. We saw that there are solutions that, in general, address a specific computing environment or application. A couple of challenges were not addressed directly, but they should be mentioned here. 1 break down the technical challenges of P2P into these areas: Connectivity. The consequences of the current environment is that P2P services need to include mechanisms for locating peers that don’t have public IP addresses. And they need to include mechanisms for communicating through firewalls. The problem is aggravated by the fact that there different types of NATs and firewalls. There is no universal solution to naming and mapping of names to network addresses. There is no generalpurpose solution to the traversal of firewalls. Security and Privacy. In many ways, the security issues for P2P are similar to those of any Internet user. The commonality applies to what happens while the messages move between locations. The added risks to the P2P user arise from allowing access to the peer’s system. Users are looking for strong security on their devices. A consistent security measures should apply across namespaces created by different P2P applications or platforms. Fault-tolerance and availability. The two main challenges here are the intermittent connectivity of resources, and how to discover these resources. This issues will continue to add complexity to P2P solutions. Performance and Bandwidth. This is an issue that needs to get more attention after the basic P2P services are in place. The topics, which have been barely mentioned in this paper, are latency and bandwidth. Scalability. Scalability is assumed to be a problem whenever we talk about the Internet and connecting peers. Clearly, we expect a general-purpose P2P infrastructure to scale with the growth of the number of participating peers. In general, any architecture with a flat structure is susceptible to stalling when it reaches certain size, in terms of number of peers. Either hierarchies or “neighbor” More and more we observe distributed computing within the enterprise, on the local network. The corporate environment is easier to mange than resources on the Internet, and sensitive data can be secured. Datasynapse provides us with one of a number of such examples. DataSynapse focuses on financial institutions and their applications. The computational needs of a financial institution offer a number of computationally-intensive applications that can be parsed out into smaller tasks, which can be executed in parallel. In other words, the large computation can be divided among underutilized PCs scattered around the organization. Grid computing claims many projects, some of them were described above. There are two important differences between Grid computing and the examples above of Internet distributed computing. First, Grid computing offers a more comprehensive set of services. It supports collaborations between nodes at different locations, distributed storage, data visualization, and more. Its scope is broader than that which we associate with distributed computing on the Internet or on intranets. Architecturally, traditional Grid computing projects are about connecting and distributing computations and storage among a relatively small number of powerful data centers. Contrast this with Internet distributed computing that may involve millions of PCs. “Web testing” is an interesting by-product of platforms for network-based computations. They provide the ability to test performance aspects on the web in ways that are not otherwise possible. Distributed computing companies, such as United Devices* and Entropia*, have realized the opportunity and the need, and will be offering tools for measurements on the net. The beneficiaries from web testing are likely to be various network service providers and IT organizations within the enterprise. zyxwvuts zy 26 relationships may be necessary for the environment to scale. Self-management of systems. In P2P, individual users provide services to others. They act in a limited capacity of servers. Many of these users do not have the experience or the tools to manage a server. Users are not comfortable in making their resources available to others without a sense of control and monitoring. IT managers are nervous about loss of control over the network and exposing enterprise resources. Interoperability. The P2P infrastructures described here, and all others, are not interoperable with each other. They share a small set of communication protocols, and most use XML and SOAP. Yet, the end result is that we don’t have the interoperability or the bridges for crossing over from one P2P framework to another. Complexity. Complexity is a good term to wrap aroundthe technical challenges that are barriers to a wide acceptance of P2P. Complexity, and the technical issues that bring it about, are a result of taking the computing action to the edges of the net. We now have multiple autonomous entities, multiple points of entry into a network, and distributed resources. Security, monitoring, and administration seem much more complex. At least, they will as long as we examine the new environment from the centralized mindset vantage point. managing distributed resources, and integration with existing applications and environment. There is a cultural challenge of creating atmosphere that will be welcoming to non-expert users who wish to step forward and start an online forum with others. A big part of the challenge above is how to create trust between remote and unknown users. This is related to ways in which we can develop meaningful reputation metrics associated with the peers. Users also wish to be protected from unwanted and undesirable content. This problem, though true for the Internet in general, can be aggravated by P2P services. And, of course, there is much talk about the business viability and business models for P2P. This becomes a non-issue if we think of P2P as a compute model and a set of technologies that will eventually be adopted by mainstream computing community. zyxw zyxwvutsrq 6 . Looking Ahead The P2P story is still unfolding. Not only is the story not finished, but we are probably only in the very early phase of learning what can be done with P2P technologies. Even the technologies themselves are still evolving. There are numerous industry-wide projects and initiatives that aim at promoting and advancing P2P computing. Some are major product developments by companies such as Microsoft* and Sun Microsystems*. Other activities are performed jointly as Industry working groups and consortia (see [7] through [I 11). 5.2 Social and mindset challenges There are certain behavioral and social types of challenges that P2P evokes. I will list here some of them. The concept of self-organized online communities is central to quite a few P2P applications. We saw that intermittent presence and moving the control of resources to the edges cause a number of technical challenges. The loss of central control is also difficult for IT organizations, where the administration and monitoring of resources is centralized. Indeed, all shared resources are central or controlled by a central authority. Changing a mindset from a static center providing resources to the nodes on the edge, to that of a dynamically changing reservoir of resources at the edge is a difficult one. First, one needs to think “distributed.” Then, one needs to be comfortable, with resources moving around and switching irregularly from being connected to being disconnected. There is much relevant research and development that does not carry the label pf P2P, yet is instrumental to its success. This includes research into distributed storage systems over the Internet, and development of techniques and products using intelligent agents and standardization of metadata schemes. P2P came on the (mainstream) scene with a bang. Now the dust has settled, and we can clearly see all the work that is still to be done. There are technical and social challenges that need to be overcome. And there is great promise of the wonderful ideas people have for applications, which, in many ways, bridge the gap between the real world and the cyber-world. They are about interacting and sharing at the edges, where the people are. IT managers have concerns that contribute to reluctance to embracing of P2P. The concerns focus on security, possibility of introduction of unknown components, 27 zy What we have now are the pieces. The early P2P pioneers showed us what could be done, and they will tell us what else they can do. Startup developers and corporate architects are getting together, finding unifying principles, common standards, and fundamental services on which to build a P2P structure. It is a comprehensive and complete model for computing; a model that coexists with, and enhances, the Internet models of today. What can we expect to see when P2P is used as casually as “client-server” is used today? P2P will become an integral part of the solution to security on the Internet, an issue that exists independently of the arrival of P2P. P2P will present new ways to address the growing issue of extracting meaning and relevance from the information glut presented by the Web. We will witness huge populations of users participating in solving computational problems too large to solve before: problems that will advance our understanding of diseases, our environment, and our climate changes. We will witness an acceleration of breaking down geographical and cultural boundaries through the formation of online communities based on shared values and shared interests. zyxwvutsr zyxwvutsrqpo zyxwvutsrq zyxwvuts And, there will probably be much more. References [ I ] Barkai, D., Peer-to-Peer Computing: Technologies for Sharing and Collaborating on the Net, Intel Press, Hillsboro, OR (2001). [2] The MetaProcessor* Platform, http://www.ud.com/products/mp-platform. htm [3] NextPage, http://www.nextpage.com/main.asp [4]Endeavors Technology, Technical Papers, http://www.endtech.com/papers.html [5] Avaki, Literature, http://www.avaki.com/papers/index.html [6] Foster, I., Kesselman, C., Tuecke, S . , The Anatomy of the Grid. To be published Intl. J. Supercomputer Applications, 2001. http://www.globus.org/research/papers/anatomy.pdf [7] P2P Working Group, http://www.p2pwg.org [8] The New Productivity Initiative, http://www.newproductivity.org/npi-in.html [9] Global Grid Forum, http://www.gridforum.org [IO] Microsoft .NET, http://www.gotdotnet.comlp2p [ 1 I] Sun Microsystems JXTA, http://www.jxta.org zyxwvutsrq zyxwvut 28