Academia.eduAcademia.edu

A Perspective on Distributed Computer Systems

Distributed computer systems have been the subject of a vast amount of research. Many prototype distributed computer systems have been built at university, industrial, commercial, and government research laboratories, and production systems of all sizes and types have proliferated. It is impossible to survey all distributed computing system research. Instead, this paper identifies six fundamental distributed computer system research issues, points out open research problems in these areas, and describes how these six issues and solutions to problems associated with them transect the communications subnet, the distributed operating system, and the distributed database areas. It is intended that this perspective on distributed computer system research serve as a form of survey, but more importantly to illustrate and encourage a better integration and exchange of ideas from various subareas of distributed computer system research.

1102 IEEE TRANSACTIONS ON COMPUTERS, VOL. C-33, NO. 12, DECEMBER 1984 A Perspective on Distributed Computer Systems JOHN A. STANKOVIC, MEMBER, IEEE (Invited Paper) Abstract - Distributed computer systems have been the subject of a vast amount of research. Many prototype distributed computer systems have been built at university, industrial, commercial, and government research laboratories, and production systems of all sizes and types have proliferated. It is impossible to survey all distributed computing system research. Instead, this paper identifies six fundamental distributed computer system research issues, points out open research problems in these areas, and describes how these six issues and solutions to problems associated with them transect the communications subnet, the distributed operating system, and the distributed database areas. It is intended that this perspective on distributed computer system research serve as a form of survey, but more importantly to illustrate and encourage a better integration and exchange of ideas from various subareas of distributed computer system research. Index Terms -Communications subnet, computer networks, distributed computer systems, distributed databases, distributed operating systems, distributed processing, system software. I. INTRODUCTION A DISTRIBUTED computer system (DCS) is a collection A of processor-memory pairs connected by a communications subnet and logically integrated in varying degrees by a distributed operating system and/or distributed database system. The communications subnet may be a widely geographically dispersed collection of communication processors or a local area network. The widespread use of distributed computer systems is due to the price-performance revolution in microelectronics the development of cost effective and efficient communication subnets (which is itself due to the merging of data communications and computer communications), the development of resource sharing software, and the increased user demands for communication, economical sharing of resources, and productivity. A DCS potentially provides significant advantages, including good performance, good reliability, good resource sharing, and extensibility [31], [36], [56]. Potential performance enhancement is due to multiple processors and an efficient subnet, as well as avoiding contention and bottlenecks that exist in uniprocessors and multiprocessors. Potential reliability improvements are due to the data and control redundancy possible, the geographical distribution of the system, and the ability for mutual inspection of hosts and communication processors. With the proper subnet and distributed operating system, it is possible to share hardware and software resources in a cost effective manner increasing productivity and lowering costs. Manuscript received May 7, 1984; revised July 14, 1984. The author is with the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA 01003. Possibly the most important potential advantage of a DCS is extensibility. Extensibility is the ability to easily adapt to both short and long term changes without significant disruption of the system. Short term changes include varying workloads and subnet traffic, and host or subnet failures or additions. Long term changes are associated with major modifications to the requirements of the system. In trying to achieve the advantages of DCS,'s, the scope of research has been very broad. In spite of this, there is a relatively small number of fundamental issues dominating the field. Solutions to these fundamental issues have not yet been consolidated in a comprehensive way, thereby thwarting the full potential of DCS's. After a brief overview of DCS research (Section II), this paper provides a perspective on six fundamental DCS issues (the object model, access control, distributed control, reliability, heterogeneity, and efficiency), identifies problems associated with these issues, shows how these issues interrelate, and describes how they are addressed in different subareas of DCS research (Section III). It is intended that this perspective on DCS research serve as a form of survey, but more importantly to illustrate and encourage a better integration and exchange of ideas from various subareas of DCS research. To keep the scope of this paper reasonable, two fundamental issues, research in the theory and specification of distributed systems, and the need for a distributed systems methodology are not specifically discussed. A theory of distributed systems is needed to better understand theoretical limitations and complexity. Specification languages must be extended to better treat parallelism, reliability, the distributed nature of the system being specified, and the correctness of the system. A methodology for the design, construction, and maintenance of large complex distributed systems is necessary. This methodology must address the specific problems of DCS's such as distribution and parallelism. Finally, Section IV contains summary remarks. II. DISTRIBUTED COMPUTER SYSTEMS DCS research encompasses many areas, including: the communication subnet, local area networks, distributed operating systems, distributed databases, concurrent and distributed programming languages, specification languages for concurrent systems, theory of parallel algorithms, parallel architectures and interconnection structures, fault tolerant and ultrareliable systems, distributed real-time systems, cooperative problem solving techniques of artificial intelligence, distributed debugging, distributed simulation, and distributed applications [23], [47], [89]. There are also the 0018-9340/84/1200-1102$01.00 © 1984 IEEE STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS associated efforts of modeling and analysis, testbed development, measurement and evaluation, and prototype implementations. An extensive survey and bibliography would require hundreds of pages. In this section we concentrate on briefly categorizing and identifying actual distributed computer systems, rather than all related research. At the end of this paper there is a list of textbooks, survey articles, and references to provide further information on these research areas. For a more extensive survey of DCS research issues see [126]. Strictly speaking, a DCS is the sum total of the physical network and all the controlling software. Therefore, each category discussed below (local area networks, wide area networks, network operating systems, distributed operating systems, distributed file servers, distributed real-time systems, and distributed databases) actually refers to a particular aspect of a DCS and not an entire DCS. Local Area Networks: According to Stallings [ 118] "a local network is a communications network that provides interconnection of a variety of data communicating devices within a small area." A small area generally refers to a single building or possibly spanning several buildings. A network with a radius of 20 km would border between a local network and a long haul network. Local networks are sometimes classified into three types: a local area network (LAN), a high speed local network (HSLN) which is typically found in a large computer center, or a digital switch/computerized branch exchange (CBX). Typical LAN's are the ring, the baseband bus, and the broadband bus. HSLN and CBX's are not discussed in this paper. Common protocols for accessing ring LAN's are the Newhall (token) protocol [38], [39], the IEEE 802 token ring protocol [118], the Pierce protocol (the slotted ring) [91], or the delay insertion protocol [74]. Prime and Apollo [15] manufacture token rings; the Cambridge ring [87] and Spider [72] are examples of slotted rings; and DDLCN [74] is a delay insertion ring. A baseband network refers to transmission of signals without modulation and the entire spectrum of the medium (cable) is consumed by the signal. Baseband LAN's are typically accessed via a carrier sensed multiaccess/collision detect (CSMA/CD) protocol commonly referred to as Ethernet [84]. DEC, Intel, and Xerox support a product which runs the Ethernet protocol. Broadband LAN's use frequency division multiplexing and divide the spectrum of the medium (twisted pair, coaxial cable, or fiber optics) into channels each of which carries analog signals or modulated digital data. For example, some channels may be used for point-to-point data communication, other channels can utilize a contention protocol such as Ethernet, other channels may be assigned for video traffic, and yet other channels may be dedicated to voice traffic. Wang manufactures a broadband LAN. Wide Area Networks: A wide area network (WAN) is a geographically dispersed collection of hosts and communication processors where the distances involved are large. Another common name for these networks is long haul networks. WAN's include ARPANET [78], Cyclades [93], 1103 TYMNET, and Telenet. Commercial vendors and several standards organizations have developed computer network architectures. A network architecture describes the functionality of a computer network in a structured and layered manner. The ISO reference niodel [134] is typical of these architectures and it contains seven layers: the physical, data link, network, transport, session, presentation, and application layers. Some examples of commercial network architectures which are used in a wide variety of applications include: Burrough's Network Architecture, DEC's DECNET, Honeywell's Distributed Systems Architecture, IBM's Systems Network Architecture, Siemans' TRANSDATA, and Univac's Distributed Communication Architecture [80]. Strictly speaking, network architectures exist on both LAN's and WAN's, but they are listed here because they were first conceived within the WAN context. Network Operating Systems: Consider the situation where each of the hosts of a computer network has a local operating system that is independent of the network. The sum total of all the operating system software added to each host in order to communicate and share resources is called a network operating system (NOS). The added software often includes modifications to the local operating system. NOS's are characterized by being built on top of existing operating systems, and they attempt to hide the differences between the underlying systems. The most famous example of such a computer network is ARPANET and it contains several NOS's, e.g., RSEXEC and NSW [71]. RSEXEC includes a uniform command language interpreter and a networkwide execution environment for user programs. It is intended as an experimental vehicle for exploring NOS issues. NSW is a NOS that supports software development by providing a uniform access to a set of software tools. A systemwide file system is a major element of NSW. XNOS [61] is another example of a NOS. Distributed Operating Systems: Consider an integrated computer network where there is one native operating system for all the distributed hosts. This is called a distributed operating system (DOS). A DOS is designed with the network requirements in mind from its inception and it tries to manage the resources of the network in a global fashion. Therefore, retrofitting a DOS to existing operating systems and other software is not a problem in DOS's. Since DOS's are used to satisfy a wide variety of requirements their various implementations are quite different. Table I lists a number of DOS's. Note that the boundary between NOS's and DOS's is not clearly distinguishable. Distributed File Servers: A file system is an integral part of a NOS, a DOS, and a distributed database system. Hence, there has been a great deal of research in this area.- In most distributed systems the file system is considered a server which fields requests from users as well as from the rest of the operating system. File servers support the illusion that there is a single logical file system when, in fact, there may be many different file systems. Depending on the level of sophistication implemented, the file server may support replication, movement of files, and reliable updates to files in addition to the common file system commands. Typical file 1104 IEEE TRANSACTIONS ON COMPUTERS, VOL. C-33, NO. 12, DECEMBER 1984 TABLE I DISTRIBUTED OPERATING SYSTEMS ACCENT ADCOS AEGIS ArchOS [98] [121] [15] [57] [87] CAP CHORUS [52] CLOUDS [77] DCS [38] DEMOS/MP [94] Domain Structure [26] HXDP LOCUS Medusa MICROS [56] [921 [90] [143] MIKE [138] RIG [18] ROSCOE [114] STAROS [59] TRIX [148] defined experiments run on these systems would provide valuable input to all researchers in their attempts to formulate solutions, models, and a general methodology and theory for distributed systems. On the other hand, even though these current systems do not theoretically achieve the full "potential" benefits of distributed computing, they are important systems in their own right, contributing to the state of the art and, in general, meeting their own specific requirements quite well. UNIX (6)a EDEN WEB [53] [68] Fully DP [37] aThe (6) after UNIX indicates that there are at least six extensions to UNIX for distributed systems including LOCUS [92], [97], [140], and those extensions done at Bell Laboratories [75], Berkeley [102], Purdue, and Newcastle upon Tyne [96]. servers are: the Cambridge file server [32], DFS [132], the Felix file server [43], ROE [40], Violet [46], and WFS [ 133]. See [85] for a comparison of the Xerox DFS file server and the Cambridge file server. Distributed Real-Time Systems: Nuclear power plants and process control applications are inherently distributed and have severe real-time constraints and reliability requirements. These constraints add considerable complication to a DCS. Airline reservation and bahking applications are also distributed, but have less severe real-time and reliability constraints and are easier to build. Examples of the more demanding real-time systems include ESS [19], REBUS [16], and SIFT [81]. ESS is a software controlled electronic switching system developed by the Bell System for placing telephone calls. The system meets severe real-time and reliability requirements. REBUS is a fault tolerant distributed system for industrial real-time control, and SIFT is a fault tolerant flight control system. DistributedDatabases: An important application of DCS's is to allow databases (in contrast to a file system) to be geographically distributed across the network while at the same time being logically integrated. This distribution often follo,ws some natural distribution, such as the recQrds of airline reservations being at the site of plane departures, or a company's inventory, personnel, orders, and production records residing at the appropriate locations. Increased reliability and performance can also be attained with a distributed database. The best known distributed database systems are Distributed Ingres [127], R* [72], and SDD-1 [21], [101]. Distributed Ingres, developed at the University of California at Berkeley, is based on the relational model and as its name suggests is an extension of Ingres. R* is also based on a relational model and is an IBM research project. SDD-1 has been built by Computer Corporation of America in a layered architecture and is specifically designed for reliability. This list of actual DCS's serves to outline the various research areas that we consider in a combined manner in the remainder of the paper. It should be noted that many of the systems referenced in this section are experimental systems, each grappling with the fundamental research issues discussed in the next section. Performance data concerning well III. FUNDAMENTAL ISSUES This section provides a perspective on six interrelated DCS issues: the object model, access control, distributed control, reliability, heterogeneity, and efficiency. These fundamental issues are described, problems associated with these issues are identified, the relative importance of the issues is discussed, and the issues are viewed as topics which span the subnet, operating system, and database levels. It is suggested that solutions from each of these areas can be better utilized in all areas. A. The Object Model A data abstraction is a collection of information and a set of operations defined on that information. An object is an instantiation of a data abstraction. The concept of an object is usually supported by a kernel which may also define a primitive set of objects. Higher level objects are then constructed from more primitive objects in some structured fashion. All hardware and software resources of a DCS can be regarded as objects. The concept of an objec-t and its implications form an elegant basis for a DCS [59], [60], [103], [144]. For example, a process can be considered as an execution trace through a dynamically changing collection of objects. All objects required by a process at some instance of time is called a domain [26]. This includes code and environment objects such as process control blocks and file control blocks. The necessary set of information needed when moving a process is then clearly delineated by the domain. There are other complications of course because some of the objects needed by a process may be owned entirely by this process, others may be simultaneously shareable with other processes, and yet others may be shared serially. The elegance of the object model is also witnessed in naming, addressing, and accessing objects which can all be handled by "system" objects. Individual objects communicate with each other via well defined communication primitives (such as SEND/WAIT and SEND/NOWAIT) [122]. The idea of an object as described above is an abstraction that is independent of the distribution issues found in DCS's. It is almost trivial to extend the object abstraction to distributed systems. Typical distributed systems functions such as (static and dynamic) allocation of objects to a host, moving objects, remote access to objects, sharing objects across the network, and providing interfaces between disparate objects are all "conceptually" simple because they are all handled by yet other objects. The object concept is powerful and most DOS's listed in Section II are based on objects. 1105 STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS In DCS's, complications arise when it is time to decide how an object is to perform its function. In many cases, though, this problem is reduced if an object model has been used. Consider a global scheduler that decides to move a process to another host on the network because it predicts improved response time. The fact that all the resources required' by that process at a given time is an easily detectible collection of objects (a domain) facilitates this movement. As another example of the advantages of objects, consider a centralized name server that is causing a performance bottleneck and reliability problems. Reprogramming the name server in a distributed fashion might relieve these problems without affecting users of the name server object because to users of the name server the interface has remained the same. Objects also serve as the primitive entity in more complicated structuring of distributed computation. One type of distributed computation is a process which executes as a sequential trace through objects, but does so across multiple hosts (Fig. 1). Work such as [26],[127]-[129], [139] has been based on this definition. Another form of a distribute'd computation is to have clusters of processes (collections of objects) running in parallel and communicating with each other in any number of ways based on the type of interprocess communication (IPC) used. The cluster of processes may be colocated or distributed in some fashion (Fig. 2). As an example, this form of computation is used in [59], [90], [125]. Other notions of wh4t constitutes a distributed program are possible, but most are also based on an object model. The major problem with object based systems has been poor execution time performance. However, this is not really a problem with the object abstraction itself, but is a problem with inefficient implementation of access to objects. The most common reason given for poor execution time is that current architectures are ill suited for object based systems. Another problem is choosing the right granularity for an object [49], [99]. If every integer or character and their associated operations are treated as objects, then the overhead is too high. If the granularity qf an object is too large, then the benefits of the object based system are lost. Needed are better architectures, better performance analyses to choose good object granularity, and the ability to develop object based systems where compilers automatically produce optimized machine code avoiding object overhead when feasible [120]. The object model is considered a fundamental issue of DCS research because it is a convenient primitive for DCS's, simplifying design, implementation, and extensibility. Further, all of the other fundamental DCS issues can be discussed in terms of objects. This is illustrated throughout the remainder of the paper. B. Access Control A distributed system contains a collection of hardware and software resources each of which can be modeled as objects. Accessing the resources [76] must be controlled in two ways. One, the manner used to access the resource, must be suitable to the resource and requirements under consideration. For example, a printer or bus must be shared serially (mutual exclusion), local data of an object should be unshared, a read Host 2 hlost 1 Host 3 A Distributed Program = P_A + P_B + P_C. Numbers 1-5 indicate the order of the sequential trace through the program. Fig. 1. Distributed computation_1. P6 are processes (collections of objects and an execution trace through some objects) running in parallel. P1- O SEND/WAIT Communication Q SEND/NOWAIT Communication. Note: Another variation would permit a process, say exist across hosts as in Figure 1. P4, to Fig. 2. Distributed computation_2. only file (an object) can be accessed simultaneously by any number of users, and multiple copies of a file must be consistent. Two, access to a resource must be restricted to the set of allowable users. This is usually done by an access control list, an access control matrix, or by capabilities. Dynamic changes to access rights of a user cause some difficulties, but solutions do exist [58]. In the remainder of this section we expound upon the first aspect of access control because it is more difficult and interesting than restricting the set of allowable users. Serial sharing of a single resource can be enforced in any number of ways including locking. However, in LAN's and WAN's we find many other interesting techniques. For example, serial access to a ring'can be enforced by using a token protocol. Time division multiplexing is a static serial sharing of the communication medium. Polling is a technique where 'a central host queries stations, one at a time, as to whether they have traffic to send, thereby enforcing serial sharing. Reservation schemes, a type of collision free proto- 106 col, require stations to reserve future time slots on the communication medium to completely avoid conflict. Another class of access protocols, called limited contention protocols, allows contention for serial use of the bus under light loads to reduce delays, but essentially becomes a collision free protocol under heavy loads, thereby increasing channel efficiency. See [134] for a full description of collision free and limited contention protocols. We believe that most ideas from the subnet research area can be better exploited in other parts of DCS's. For example, one idea which has already been used is a "virtual" token scheme [70] for concurrency control in a distributed database system. The token circulates on a virtual ring and carries with it a sequencer that delivers sequential and unique integer values called tickets. This scheme insures serial access to data items by allowing access to an object only if the requestor holds the proper ticket. Another potential area for sharing ideas is in distributed time critical applications where such applications could make use of reservation and limited contention protocols in controlling access to objects. This approach would better guarantee access within the time constraints. In addition to resources which must be shared serially, DCS's contain resources which can be shared simultaneously. Resources that can be shared simultaneously pose no difficulty if accessed individually. However, it is sometimes necessary to access a group of resources (objects) at the same time. This gives rise to transactions which are used extensively in the database area. A transaction is an abstraction which allows programmers to group a sequence of actions into a logical unit. If executed atomically (either the entire transaction is executed or none of it is executed), a transaction transforms a current consistent state of the resources into a new consistent state. The virtues of transactions are well documented [511, [66] and are important enough that transactions have also appeared in distributed programming languages [73], and in distributed operating systems [77], [116], [131]. Multiple transactions may have access problems among themselves. Protocols for resolving data access conflicts between transactions are called concurrency control protocols [21], [69], [137]. Three major classes of concurrency control protocols are: locking, timestamp ordering [100], and validation (also called the optimistic approach [63]). Locking is a well known technique that is already used at all levels of a system. Timestamp ordering is an interesting approach where all accesses to data are timestamped and then some common rule is followed by all transactions in such a way as to ensure serial access [137]. This technique should be useful at all levels of a system, especially if timestamps are already being generated for other reasons such as to detect lost messages or failed hosts. A variation of the timestamp ordering approach is the multiversion approach. This approach is interesting because it integrates concurrency control with recovery. In this scheme, each change to the database results in a new version. Concurrency control is enforced on a version basis and old versions serve as checkpoints. Handling multiple versions efficiently is accom- IEEE TRANSACTIONS ON COMPUTERS, VOL. c-33, NO. 12, DECEMBER 1984 plished by differential files [107]. Validation is a technique which permits unrestricted access to data items but then there are checks for potential conflicts at the commit point. The commit point is the time at which a transaction is sure that it will complete. This approach is useful when few conflicts are expected. The correctness of a concurrency control protocol is usually based on the concept of serializability [22]. Serializability means that the effect of a set of executed transactions (permitted to run in parallel) must be the same as some serial execution of that set of transactions. In many cases and at all levels the strict condition of serializability is not required. Relaxing this requirement can usually result in access techniques that are more efficient in the sense of allowing more parallelism and faster execution times. An example of this is the scheduling function. The scheduling algorithm is running in parallel on many hosts and must make decisions quickly. The scheduler would rather make quick decisions using somewhat inconsistent data than have to lock all the distributed information that it might use in arriving at a solution. A research problem is determining the value that the completeness and accuracy of information contributes to the quality of the result (in this case response time or throughput). The concurrency control protocols as well as the previously mentioned access control techniques of the subnet are implemented across the entire spectrum from centralized to distributed control. Hence, access control is very closely related to the discussion in the next section on distributed control. Further, some access control may be done statically (e.g., scope rules in programming languages), but most access control must occur dynamically during system operation. For example, the data items to be accessed by a process may not be known prior to execution, and therefore, access control to these data items must be controlled dynamically. How dynamic access control is done affects the performance and fairness of the protocol. Normal access must sometimes be bypassed for external interrupts, for alarms, or for failures. Careful normal access control coupled with techniques for failure situations signifies that access control is also highly related to reliability. A particular access control technique must resolve deadlock and livelock -two other issues related to performance and reliability [48], [82]. Finally, we point out that objects and access to them are intimately related and of equal importance. Access control may be implemented by a distributed control protocol and has important effects on efficiency and reliability. It is a complicated problem to choose the right access control technique for a given set and type of objects. Ideas for solutions to the access control problem from the subnet, NOS, DOS, and database areas should be interchanged to obtain better performance and reliability. C. Distributed Control Depending on the application and requirements, distributed control can take on many various forms. Although research has been active for all types of distributed control, the STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS majority of the work is based on extensions to the centralized state-space model and can be more accurately described as decomposition techniques [ 135], [136], rather than distributed control. In such work, large scale problems are partitioned into smaller problems, each smaller problem being solved, for example, by mathematical programming techniques, and the separate solutions being combined via interaction variables. In many cases the interaction variables are considered negligible and in others they are limited in the sense that they model very limited cooperation. See [55], [67], [105] for excellent surveys of these types of distributed control (decomposition). Note that in many of these cases it is an irrelevant detail that individual subproblems are solved in parallel on different computers and reliability is often not an issue. Further, many of the techniques require extensive computing power making them more suitable for application programs rather than system functions. For DCS's, another form of distributed control is also important; that is, distributed control where decomposition is not possible to any large degree and where cooperation among the distributed controllers is the most important aspect. The DCS environment also results in a set of additional very demanding requirements. For example, some functions implemented in this environment are dynamic, distributed, adaptive, asynchronous, and operate in noisy, error prone, and uncertain (stochastic) situations. Another important aspect of the environment is that significant delays in the movement of data are commonplace and must be accounted for. It is important to note that the stochastic nature of the system being controlled affects two distinct aspects of the functions: an individual controller's view of the system is an estimate, and future random forces can effect the system independently of the control decision. Scheduling, startup of multiple distributed processes of a system or application program, and processes which test the reasonableness of distributed data are examples of this form of distributed control. We will now consider distributed control of this more demanding type across three levels: the subnet, the DOS, and the distributed database levels. Functions in the subnet such as access control (discussed above), routing [44], [106], and congestion control are good candidates for distributed control. Routing is the decision process which determines the path a message follows in passing from its source to its destination. Some routing schemes are completely fixed, others contain fixed alternate paths where the alternative is chosen only on failures. These nonadaptive schemes are too limited and do not constitute distributed control. Adaptive routing schemes modify routes based on changing traffic patterns. Adaptive routing schemes may be centralized where a routing control center calculates good paths and then distributes these paths to the individual hosts of the network in some periodic or aperiodic fashion. This is not distributed control either. Routing algorithms which exhibit distributed control typically contain n copies of the algorithm (one at each communication processor). Information is exchanged among communication processors periodically or asynchronously as a result of some noticable change in traffic. The information 1107 exchanged varies depending on the metric used by the algorithm (e.g., the metric might be number of hops, some estimate of delay to the destination, or buffer lengths). Each copy of the routing algorithm uses the exchanged (out-ofdate) information in making routing decisions. Such algorithms have the potential for good performance and reliability because the distributed control can operate in the presence of failures and quickly adapt to changing traffic patterns. On the other hand, several new problems arise in such algorithms. If the algorithm is not careful, then phenomena known as pingponging (message looping) and poor reaction to "bad news'' might occur [134]. These problems were recognized in the old ARPANET distributed routing algorithm and various solutions now exist [79]. Rather than continuing the discussion of such algorithms, we simply note that there is a large degree of similarity between routing messages in the subnet and scheduling processes on hosts of the DCS. We consider scheduling in Section III-F and the reader is encouraged to note the similarities. Another type of distributed routing algorithm is based on "n" spanning trees being maintained in the network (one at each host). Each spanning tree identifies all "n" destinations in the DCS [83]. Each spanning tree is largely independent of the other trees so this is not a highly cooperative type of distributed control. Such an approach does 'have a number of advantages such as guaranteeing that there will be no looping of messages. When too many messages are in the subnet, performance degrades. Such a situation is called congestion. In fact, depending on the subnet protocols it may happen that at high traffic, performance collapses completely, and almost no packets are delivered. Solutions include preallocation of buffers and performing message discarding only when there is congestion. This is usually done by choke packets [ 134]. A particularly interesting distributed control algorithm for congestion control is called isarithmic congestion control [134]. In this scheme a set of permits circulates around the subnet and a set of permits is fixed at each host. Whenever a communication processor wants to transmit a message it must first acquire a permit, either one assigned to that site (and not being used) or a circulating permit. When a destination communication processor removes a message from the subnet it regenerates the permit. Stationary permits are considered free upon message acknowledgment. This scheme limits the number of messages in the subnet to some maximum, given by the number of permits in the system. Can such a scheme be used for other aspects of DCS's such as scheduling processes on processors, or improving transaction response time in a distributed database system by avoiding thrashing situations? Research on various topics in distributed databases [22], [62] has been extensive. One of the main research issues is concurrency control. Various algorithms for concurrency control have appeared, including some based on distributed control. One such algorithm is found in the Sirius-Delta database system developed at INRIA 170]. In this system integrity of the database is maintained by distributed controllers in the presence of concurrent users. The distributed con- 1108 IEEE TRANSACTIONS ON COMPUTERS, VOL. trollers must somehow cooperate to achieve a systemwide objective of good performance subject to the data integrity constraint. In Sirius-Delta the cooperation is achieved by the combined principles of atomic actions and unique timestamps. It can then be proven that the multiple distributed controllers can operate concurrently and still meet the data integrity constraint. Bernstein and Goodman [22] have also shown that another class of algorithms based on Two Phase Locking and atomic actions can also achieve this same cooperation. However, for other functions of distributed operating systems such as scheduling, there is no requirement for the data integrity constraint. In fact, as stated before, if such a constraint were required for scheduling, then in general, performance of the system would suffer dramatically. Removing the data integrity constraint from the scheduling algorithm will improve its performance, but the control problem becomes much more difficult. In the operating system arena, functions such as scheduling, deadlock detection, access control, and file servers are candidates for being implemented via distributed control. Consider an individual operating system function to be implemented by n distributed replicated entities (controllers). For reliability we require that there is no master controller; in other words each of the entities is considered equal (democratic) at all times. Furthermore, one of the most demanding requirements is that most operating system functions must run in real-time with minimum overhead (time sensitive). This requirement eliminates many potential solutions based on mathematical or dynamic programming. Central to the development of distributed control functions is the notion of what constitutes optimal control. However, such a notion for dynamic, democratic, decentralized, replicated, adaptive, stochastic, and time-sensitive functions has not yet been well formulated. In fact, this is such a demanding set of requirements that there are no mathematical techniques that are directly applicable. To better illustrate the complexities involved, we discuss the complexities in terms of the mathematical discipline of team theory [54]. At a high level of abstraction team theory appears to be a good candidate as a mathematical discipline to base distributed control algorithms on because it contains all the essential ingredients of the distributed control problem. The main ingredients are as follows. 1) The presence of different but correlated information for each decision maker about some underlying uncertainty; and 2) the need for coordinated actions on the part of all decision makers in order to realize the systemwide payoff. More formally, a team-theoretic decision problem contains five components [54]. 1) A vector of random variables 0 = [01, 02,' 0, m] with given distribution p(0). The random vector represents all the uncertainties that have a bearing on the problem and is called the states of nature. 2) A set of observations z = [ZI, Z2, ,Zn] which is given functions of 0, i.e., zi = 7q(01, 02, Om) i1=l1, 2, , n. In general zi is a vector and is the observation available to the ith decision maker. 3) A set of decision variables (or controls) . . . C-33, NO. 12, DECEMBER 1984 , un], one per decision maker. The variables 0, z, [ul, U2 u are assumed to take values in appropriate spaces 0, Z, U, respectively. 4) The strategy (decision rule) of the i th decision maker is a function ui = yi(zi) where yiVi is chosen from some admissible class of functions Fi. 5) The loss (payoff) criterion LOSS = L(u1, U2,* 'Un, 01, 02, Finally, the team decision problem is Find yi E vi, 9,,) - in order to min J = E[L(u = y(-q(O)), 0)]. Note that this model is the simplest form of a team decision and it yields a complicated optimization problem that, in general, could not be solved in real time on a DCS. The above model is completely static (one team decision is made) and the controls (actions) ui of each decision maker do not depend on the actions of the other decision makers. Cooperation is only occurring statically through the systemwide objective function and dynamically via changes in the states of nature. For most DCS's, component 4 must be modified to =i q u) In other words, the observations of a controller depend on both the states of nature and the decisions of the other decision makers. This team problem is unsolved today. In DCS functions, the problem is even more difficult than this unsolved Team Theory problem because there are additional problems, e.g., there are inherent delays in the system causing inaccuracies and eliminating the possibility of immediate response to actions, and decisions must be made quickly. Further, team theory does not directly deal with stability, an issue that is fundamental to distributed control. To have any hope of solving the distributed control problem we need to either relax our optimization requirement or impose more structure to the problem or both. In effect, the distributed database problem has imposed additional structure by requiring data integrity. In general, imposing additional structure includes: 1) requiring that each controller act sequentially and also requiring the controller to know the action and the result of any action of all previous controllers; 2) various n-step delay approaches [54]; 3) periodic coordination [67]; or 4) using a centralized coordinator. Even with such simplifications, the specification of additional structure does not guarantee that the resulting optimization problem is solvable in practice. Even with this additional structure the optimization problem can be too complex (and costly) to run in real time for functions like scheduling and routing. Therefore, it is necessary to develop heuristics that can be run in real time, and that can effectively coordinate distributed controllers. Even in heuristics, often the delayed effects of the interactions are not considered. Furthermore, both iterative solutions and keeping entire histories are not practical for most DCS functions. For the scheduling problem there is the added concern that it is difficult if not impossible to know the direct systemwide effect of a particular action taken by a STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS controller. For example, assume that controller i takes action "a and assume that the net effect of all the actions of all the controllers improve the system. It cannot be assumed that action "a" was a good action where, in fact, it may have been a bad action dominated by the good actions of other controllers. This is an aspect of what is referred to as the "assignment of credit problem." In summary, there are many forms of distributed control. Deciding which type is appropriate for each function in a DCS is difficult. Deciding how distributed control algorithms of different types will interact with each other under one system is even more complex. It is our hypothesis that the advantages of designing the proper algorithms in the right combinations will be improved performance, reliability, and extensibility -the major potential benefits of DCS's. Consequently, distributed control is a crucial issue. D. Reliability Reliability is a fundamental issue for any system, but DCS's are particularly well suited to the implementation of reliability techniques. Reliability is one fundamental issue where solutions have already been widely used across areas. However, we believe that recent reliability techniques in the database area should be better utilized in the operating system and subnet levels. We begin the discussion on reliability with a few definitions. A fault is a mechanical or algorithmic defect which may generate an error. A fault may be permanent, transient, or intermittent. An error is an item of information which when processed by the normal algorithms of the system will produce a failure. A failure is an event at which a system violates its specifications. Reliability can then be defined as the degree of tolerance against errors and faults. Increased reliability comes from fault avoidance and fault tolerance. Fault avoidance results from conservative design practices such as high reliability components and conservative design. Fault tolerance employs error detection and redundancy to deal with faults and errors. Most of what we discuss in this section relates to the fault tolerance aspect of reliability. Reliability is a multidimensional activity that must simultaneously address some or all of the following: fault confinement, fault detection, fault masking, retries, fault diagnosis, reconfiguration, recovery, restart, repair, and reintegration [109]. Rather than simply discussing each of these in turn, we briefly discuss how these various issues are treated at the subnet, operating system, programming language, and database levels. Frames on a subnet typically contain error detection codes such as the CRC. Conservative design may appear as topology design with "n" paths to each destination. Data link protocols use handshaking techniques with positive feedback in the form of ACK and NAK messages and where the NAK messages may contain reasons for the failure. Timers are used in the subnet to react to lost messages, lost control tokens, or network partitionings. Most protocols will employ retries to quickly overcome transient errors. Some subnets create an abstraction called a virtual circuit that guarantees the reliable transmission of messages. Flow control protocols 1109 attempt to avoid congestion, lost messages due to buffer overruns, and possible deadlock due to too much traffic and not enough buffer space. Routing algorithms contain multiple routes to each destination or a method of generating a new route given that failures occur. Alarms and other high priority messages are used to identify dangerous situations needing immediate attention. Many of the particular algorithms or protocols used are distributed to avoid single points of failure. Other typical reliability techniques used in any number of protocols include backup components, voting, consistency and range checks, and special testing procedures. All the same techniques used in the subnet can also be used in the DOS [20]. Reliable DOS's should also support replicated files [13], [45], exception handlers, testing procedures run from remote hosts, and avoid single points of failure by a combination of replication, backup facilities, and distributed control. Distributed control could be used for file servers, name servers, scheduling algorithms, and other executive control functions. Process structure, how environment information is kept, the homogeneity of various hosts, and the scheduling algorithm may allow for relocatability of processes. Interprocess communication (IPC) might be supported as a reliable remote procedure call [88], [108]. Reliable IPC would enforce "at least once" or "exactly once" semantics depending on the type of IPC being invoked. Other DOS reliability issues relate to invoking processes that are not active, or attempting to communicate to terminated processes, or the situation in which a process remains active but is not used. ARGUS [73], a distributed programming language, has explicity incorporated reliability concerns into the programming language. It does this by supporting the idea of an atomic object, transactions, nested actions, reliable remote procedure calls, stable variables, guardians (which are modules that survive node failures and synchronize concurrent access to data), exception handlers, periodic and background testing procedures, and recovery of a committed update given the present update does not complete. A distributed program written in ARGUS may potentially experience deadlock. Currently, deadlocks are broken by timing out and aborting actions. Distributed databases make use of many reliability features such as stable storage, transactions, nested transactions [86], commit and recovery protocols [112], nonblocking commit protocols [34], [110], termination protocols [111], checkpointing, replication, primary/backups, logs/audit trails, differential files [107], and timeouts to detect failures. Operating system support is required to make these mechanisms more efficient [50], [131]. From the above list of database reliability features let us consider termination and recovery protocols. A termination protocol is used in conjunction with nonblocking commit protocols and is invoked at failure detection time to guarantee transaction atomicity. It attempts to terminate (commit or abort) all affected transactions at all participating hosts without waiting for recovery. This is an extremely important feature when it is necessary to allow as much continued 1110 execution as possible (availability) in spite of the failure. A host which has failed must then execute a recovery protocol before it resumes communication with other hosts. The major functions of the recovery protocol are to restart the system processes, and to reestablish consistent transaction states for all transactions affected by the failure, if this has not been already accomplished by the termination protocol. This illustrates the close interaction that exists between the various protocols where decisions made in one protocol make subsequent protocols easier or more difficult. It is obvious that a termination (cleanup) and recovery protocol are required at all levels in the system. Hence, specific algorithms (or ideas from them) used at the database level might be applicable to other levels as well and vice versa. For example, termination and recovery protocols may themselves be distributed to enhance reliability. However, the distributed termination protocols typically require n (n - 1) messages during a round of communication (and several rounds may be necessary) where n is the number of participating entities. This is too costly for slow networks, but it may be acceptable on fast local networks or within a single host. The benefits would be greater reliability and better availability. Note that as was true for concurrency protocols (Section III-B), further improvements in efficiency and availability might be possible if these termination and recovery protocols relax the serializability requirement. One aspect of reliability not stressed enough in DCS research is the need for robust solutions, i.e., the solutions must explicity assume an unreliable network, tolerate host failures, network partitionings, and lost, duplicate, out of order, or noisy data [27]. Robust algorithms must sometimes make decisions after reaching only approximate agreement of by using statistical properties of the system (assumed known or dynamically calculated). A related question is at what level should the robust algorithms, and reliability in general, be supported? Most systems attempt to have the subnet ensure reliable, error free data transmission between processes. However, according to the end-to-end argument [104], such functions placed at the lower levels of the system are often redundant and unnecessary. The rationale for this argument is that since the application has to take into account errors introduced not only by the subnet, many of the error detection and recovery functions can be correctly and completely provided only at the application level. The relationship of reliability to the other issues discussed in this paper is very strong. For example, object oriented systems confine errors to a large degree, define a consistent system state to support rollback and restart, and limit propagation of rollback activities. Since objects can represent unreliable resources (such as processors and disks), and since higher level objects can be built using lower level objects, the goal of reliable system design is to create "reliable" objects out of unreliable objects. For example, a stable storage can be created out of several disk objects and the proper logic. Then a physical processor, a checkpointing capability, a stable storage, and logic can be used to create a stable processor. One can proceed in this fashion to create a very reliable system. The main drawback is potential loss of IEEE TRANSACTIONS ON COMPUTERS, VOL. c-33, NO. 12, DECEMBER 1984 execution time efficiency. For many systems it is just too costly to incorporate an extensive number of reliability mechanisms. Reliability is also enhanced by proper access control and judicial use of distributed control, two other fundamental issues discussed in this paper. The major challenge is to integrate solutions to all these issues in a cost effective manner and produce an extremely reliable system. E. Heterogeneity Incompatibility problems arise in heterogeneous DCS's in a number of ways [14], [17], [71] and at all levels. First, incompatibility is due to the different internal formatting schemes that exist in a collection of different communication and host processors. Second, incompatibility also arises from the differences in communication protocols and topology when networks are connected to other networks via gateways. Third, major incompatibilities arise due to different operating systems, file servers, and database systems that might exist on a (set of) network(s). The easiest solution to this general problem for a single DCS is to avoid the issue by using a homogeneous collection of machines and software. If this is not'practical, then some form of translation is necessary. Some earlier systems left this translation to the user. This is no longer acceptable. Translation done by the DCS system can be done at the receiver host or at the source host. If it is done at the receiver host then the data traverse the network in their original form. The data usually are supplemented with extra information to guide the translation. The problem with this approach is that at every host there must be a translator to convert each format in the system to the format used on the receiving host. When there exist "n" different formats, this requires the support of (n - 1) translators at each host. Performing the translation at the source host before transmitting the data is subject to all the same problems. There are two better solutions, each applicable under different situations: an intermediate translator, or an intermediate standard data format. An intermediate translator accepts data from the source and produces the acceptable format for the destination. This is usually used when the number of different types of necessary conversions is small. For example, a gateway linking two different networks acts as an intermediate translator. For a given conversion problem, if the number of different types to be dealt with grows large, then a single intermediate translator becomes unmanageable. In this case, an intermediate standard data format (interface) is declared, hosts convert to the standard, data are moved in the format of the standard, and another conversion is performed at the destination. By choosing the standard to be the most common format in the system, the number of conversions can be reduced. At a high level of abstraction the heterogeneity problem and the necessary translations are well understood. However, implementing the translators in a cost effective way has not been achieved in general. Complicated issues are precision loss, format incompatibilities (e.g., minus zero value in sign magnitude and 1's complement cannot be represented in 2's STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS complement), data type incompatibilities (e.g., mapping of an upper/lower case terminal to an upper case only terminal is a loss of information), efficiency concerns, the number and locations of the translators, and what constitutes a good intermediate data format for a given incompatibility problem. As DCS's become more integrated one can expect that both programs and complicated forms of data might be moved to heterogeneous hosts. How will a program run on this host given that the host has different word lengths, different machine code, and different operating system primitives? How will database relations stored as part of a CODASYL database be converted to a relational model and its associated storage scheme? Moving a data structure object requires knowledge about the semantics of the structure (e.g., that some of the fields are pointers and these have to be updated upon a move). How should this information be imparted to the translators, what are the limitations if any, and what are the benefits and costs of having this kind of flexibility? In general, the problem of providing translation for movement of data and programs between heterogeneous hosts and networks has not been solved. The main problem is ensuring that such programs and data are interpreted correctly at the destination host. In fact, the more difficult problems in this area have been largely ignored. It is inevitable that incompatibilities will exist in DCS's because it is quite natural to extend such systems by interconnecting networks, by adding new hosts and communication processors, and by increasing functionality with new software. Further, the main function of NOS's and file servers is precisely to present a uniform logical interface (view) to the end user, frormra collection of different environments. Depending on the degree of incompatibility, the cost of the translations can be high, thereby limiting their application in DCS's which have severe real-time constraints. While realtime systems should also be as extensible as possible, they will probably have to rely on a few translators, on good object-based design and on extensible distributed control algorithms rather than on being able to incrementally add more and more translators. For all DCS's, a method is needed to limit the number of incompatibilities during the lifetime of the system, while allowing significant and easy extensibility. The problems associated with heterogeneity are currently considered less important than the other problems considered in this paper because they can be handled on a problem by problem basis. However, as DCS's become more sophisticated and approach achieving their full potential, then we believe the heterogeneity issue will become increasingly important and a problem by problem solution may not work. F. Efficiency Distributed computer systems are meant to be efficient in a multitude of ways. Resources (files, compilers, debuggers, and other software products) developed at one host can be shared by users on other hosts limiting duplicate efforts. Expensive hardware resources can also be shared minimizing costs. Communication facilities such as electronic mail and file transfer protocols also improve efficiency by enabling better and faster transfer of information. The multiplicity of lilll processing elements might also be exploited to improve response time and throughput of user processes. While efficiency concerns exist at every level in the system, they must also be treated as an integrated "system" level issue. For example, a good design, the proper tradeoffs between levels, and the pairing down of over-ambitious features usually improves efficiency. In this section, however, we concentrate on discussing efficiency as it relates to the execution time of processes. Once the system is operational, improving response time and throughput of user processes is largely the responsibility of scheduling and resource management algorithms [28], [30], [35], [95], [124], [125], [127]-[129]. The scheduling algorithm is intimately related to the resource allocator because a process will not be scheduled for the CPU if it is waiting for a resource. If a DCS is to exploit the multiplicity of processors and resources in the network it must contain more than "n" independent local schedulers. The local schedulers must interact and cooperate and the degree to which this occurs can vary widely. We suggest that a good scheduling algorithm for a DCS will be a heuristic that acts like ap "expert system." This expert system's task is to effectively utilize the resources of the entire distributed system given a complex and dynamically changing environment. We hope to illustrate this in the following discussion. In the remainder of this section when we refer to the scheduling algorithm we are referring to the part of the scheduler (the expert system) that is responsible for choosing the host of execution for a process. We assume-that there is another part of the scheduler which assigns the local CPU to the highest priority ready process. We divide the characteristics of a DCS which influence response time and throughput into two types: 1) system characteristics, and 2) scheduling algorithm characteristics. System characteristics include: the number, type, and speed of processors, the allocation of data and programs [29], whether data and programs can be moved, the amount and location of replicated data and programs, how data are partitioned, partitioned functionality in the form of dedicated processors, any special purpose hardware, characteristics of the communication subnet, and special problems of distribution such as no central clock and the inherent delays in the system. A good scheduling algorithm would take the system characteristics into account. Scheduling algorithm characteristics include: the type and amount of state information used, how and when that information is transmitted, how that information is used (degree and type of cooperation between distributed scheduling entities), when the algorithm is invoked, adaptability of the algorithm, and the stability of the algorithm. By the type of state information, it is meant whether the algorithm uses queue lengths, CPU utilization, amount of free memory, estimated average response time, etc., in making its scheduling decision. The type of information also refers to whether the information is -local or networkwide information. For example, a scheduling algorithm on host 1 could use queue lengths of all the hosts in the network in making its decision. The amount of state information refers I12 to the number of different types of information used by the scheduler. Information used by a scheduler can be transmitted periodically or asynchronously. If asynchronously, it may be sent only when requested (as in bidding), it may be piggybacked on other messages between hosts, or it may be sent only when conditions change by some amount. The information may be broadcast to all hosts, sent to neighbors only, or to some specific set of hosts. The information is used to estimate the loads on other hosts of the network in order to make an informed global scheduling decision. However, the data received are out of date and even the ordering of events might not be known [64]. It is necessary to manipulate the data in some way to obtain better estimates. Several examples are: very old data can be discarded; given that state information is timestamped a linear estimation of the state extrapolated to the current time might be feasible; conditional probabilities on the accuracy of the state information might be calculated in parallel with the scheduler by some monitor nodes and applied to the received state information; the estimates can be some function of the age of the state information; or some form of (iterative) message interchange might be feasible. A message interchange is subject to long delays before the scheduling decision is made, and'if mutual agreement among scheduling entities is necessary even in the presence of failures, then the interchange is also prone to the Byzantine Generals Problem [33], [65]. 'The Byzantine Generals Problem is a particularly disruptive type of problem where no assumptions can be made about the type of failure of a process involved in message exchanges. For example, the failed process can send messages when it is not supposed to, can make conflicting claims to other processes, or act dead for a while and then revive. Even though it is hard to b'elieve that a process would act like this on purpose, in practice, systems fail in unexpected ways giving rise to these kinds of behavior. Protecting against this type of failure is a conservative approach to reliable systems design. Before a process is actually moved the cost of moving it must be accounted for in determining the estimated benefit of the move. This cost is different if the process has not yet begun execution than if it is already in progress. In both cases, the resources required must also be considered. If a process is in execution, then environment information (e.g., the process control blocks) probably should be moved with the process. It is expected that in many cases the decision will be not to move the process. Schedulers invoked too often will produce excessive overhead. If they are not invoked often enough they will not be able to react fast enough to changing conditions. There will be undue startup delay for processes. There must be some ideal invocation schedule which is a function of the load. In a complicated DCS environment it can be expected that the scheduler will have to be quite adaptive [12], [24], [123]. A scheduler might make minor adjustments in weighing the importance of various factors as the network state changes in an attempt to track a slowly changing environment. Major changes might require major adjustments. For example, under very light loads there'does not seem to be much justification for networkwide scheduling, so the algorithm might be turned off-except the part that can recognize a change in IEEE TRANSACTIONS ON COMPUTERS, VOL. c-33, NO. 12, DECEMBER 1984 the load. At moderate loads, the full blown scheduling algorithm might be employed. This might include individual hosts refusing all requests for information and refusing to accept any-process because it is too busy. Under heavy loads on all hosts it again seems unnecessary to use networkwide scheduling. A bidding scheme might use both source and server directed bidding [ 113], [125]. An overloaded host asks for bids and is the source of work for some other host in the network. Similarly, a lightly loaded host may make a reverse bid, i.e., it asks the rest of the network for some work. The two types of bidding'might coexist. Schedulers could be designed in a multilevel fashion with decisions being made at different rates, e.g., local decisions and state information updates occur frequently, but more global exchanges of decisions and state information might proceed at a slower rate because of the inherent cost of these global actions. Stability [25] refers to the situation where processes are moved among hosts in the network in an incremental and orderly fashion. It is not acceptable for N - 1 hosts to flood a lightly loaded host in such a manner that the previously lightly loaded host must now reassign some of the work moved to it. (Some form of hysteresis is required.) Scheduling algorithms can employ implicit or explicit stability mechanisms. An implicit mechanism exists when the algorithm is tuned so that the relative importance of the various factors used, the relative timings of the scheduling algorithm, the passing of state information, the characteristics of the processes in the system, the adaptability of the algorithm, etc., are all integrated in the right proportion to provide a stable system. Implicit treatment of stability can be dangerous, but requires less overhead than explicit mechanisms. Explicit mechanisms refer to specific logic and information that is used to better guarantee stability. The overheads of explicit mechanisms can be very high and as one tries to lower'the overheads, stability becomes jeopardized. It is not clear which technique is better. An important part of efficiency is adequate measurement techniques. Many of the issues raised above require measurement methods. It might be necessary to measure the delay in the subnet, or between two hosts, or the utilization of a host, or the probabilities of certain conditions in the distributed system. The cost of the measurement must be weighed against the benefits it produces. A classic efficiency question in any system is: what should be supported by the kernel, or more generally by the operating system, and what should be left to the user? The trend in DCS is to support objects, primitive IPC mechanisms, and processes in the kernel [115]. Some researchers advocate supporting the concept of a transaction in the kernel. This argument will never be settled conclusively since it is a function of the requirements, type of processes running, etc. This is the classical Vertical Migration question [119]. Of course, many other efficiency questions remain at all levels including those briefly discussed throughout the previous sections of this paper. These include: the efficiency of the object model, the end-to-end argument [104], locking granularity, performance of remote operations, improvements due to distributed control, the cost effectiveness of various reliability mechanisms, and efficiently dealing with heterogeneity. Efficiency is, therefore, not a separate issue STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS but must be addressed for each issue in order to result in an efficient, reliable, and extensible DCS. A difficult question to answer is exactly what is acceptable performance given that multiple decisions are being made at all levels and that these decisions are being made in the presence of missing and inaccurate information. IV. SUMMARY While it is true that DCS's have proliferated, it is also true that there remain many unsolved problems relating to the issues of the object model, access control, distributed control, reliability, heterogeneity, and efficiency as well as their interactions. These fundamental issues have been recognized for some time, but solutions to problems associated with these issues have not produced totally satisfactory systems. We will not achieve the full potential advantages of DCS's until better experimental evidence is obtained and until current and new solutions are better integrated in a systemwide, flexible, and cost effective manner. Major breakthroughs will be required to achieve this potential. It is our opinion that such breakthroughs will be largely based on distributed decision making that will necessarily use heuristics similar to those found in "expert" systems. The heuristics, though, will have to directly address the problems of distribution, such as long delays, the assignment of credit, missing and out-ofdate information, the use of statistical information, and failure events. Further, the heuristics will have to deal with such complexity very efficiently, and this eliminates many classical solutions. These complications are not typically found in expert systems to date, so it is not possible to simply "borrow" the solution. To achieve the objectives of DCS's, it is important to study distributed resource management paradigms that view resources at an integrated "system" level which includes the hardware, the communication subnet, the operating system, the database system, the programming language, and other software resources. To this end, this paper has tried to take a system viewpoint in presenting six fundamental issues. A DCS of the future will be a form of an extensible, adaptable, physically distributed, but logically integrated, expert system. In summary, this paper has presented a brief overview of distributed computer systems, and then discussed some of the problems and solutions for each of six fundamental distributed systems issues. The paper has described (by means of examples from the communication subnet, distributed operating system, and distributed database areas) the interactions among these issues, and the need for better integration of solutions. Important issues that this paper has not covered due to lack of space are the need for a theory and specification language for distributed systems, as well as the need for a distributed systems methodology. REFERENCES [1] G. R. Andrews and F. Schneider, "Concepts and notations for concurrent programming," ACM Comput. Surveys, vol. 15, no. 1, Mar. 1983. [2] P. Green, "An introduction to network architectures and protocols," IEEE Trans. Commun., vol. COM-28,-Apr. 1980. [3] L. Kleinrock and M. Gerla, "Flow control: A comparative survey," IEEE Trans. Commun., vol. COM-28, Apr. 1980. 1113 [4] M. Schwartz and T. E. Stem, "Routing techniques used in computer communication networks," IEEE Trans. Commun., vol. COM-28, Apr. 1980. [5] D. Walden and A. McKensie, "The evolution of host to host protocol technology," IEEE Computer, vol. 12, Sept. 1979. [6] D. W. Davies and D. L. A. Barber, Communication Networks For Computers. New York: Wiley, 1973. [7] D. W. Davies, D. L.A. Barber, W. L. Price, and C. M. Solomonides, Computer Networks and Their Protocols. New York: Wiley, 1979. [8] W. R. Franta and I. Chlamtac, Local Networks. Lexington, MA: Lexington Books, 1981. [9] J. Martin, Computer Networks and Distributed Processing. Englewood Cliffs, NJ: Prentice-Hall, 1981. [10] 1. McNamara, Technical Aspects of Data Communication. Maynard, MA: Digital, 1977. [11] C. Weitzman, Distributed Micro/Minicomputer Systems. Englewood Cliffs, NJ: Prentice-Hall, 1980. [12] A. K. Agrawala, S. K. Tripathi, and G. Ricart, "Adaptive routing using a virtual waiting time technique," IEEE Trans. Software Eng., vol. SE-8, Jan. 1982. [13] P. Alsberg and J. Day, "A principle for resilient sharing of distributed resources," in Proc. 2nd Int. Conf. Software Eng., 1976. [14] B. Anderson et al., "Data reconfiguration service," Bolt Beranek and Newman, Tech. Rep., May 1971. [15] Apollo Domain Architecture, Apollo Computer, Inc., Feb. 1981. [16] J. M. Ayache, J. P. Courtiat, and M. Diaz, "REBUS, A fault tolerant distributed system for industrial control," IEEE Trans. Comput., vol. C-31, July 1982. [17] M. Bach, N. Coguen, and M. Kaplan, "The ADAPT system: A generalized approach towards data conversion," in Proc. 5th Int. Conf. Very Large Data Bases, Rio de Janeiro, Brazil, Oct. 1979. [18] J. E. Ball, J. Feldman, J. Low, R. Rashid, and P. Rovner, "RIG, Rochester's intelligent gateway: System overview," IEEE Trans. Software Eng., vol. SE-2, no. 4, Dec. 1980. [19] D. K. Barclay, E. R. Byrne, and F. K. Ng, "A real-time database management system for No. 5 ESS," Bell Syst. Tech. J., vol. 61, no. 9, Nov. 1982. [20] J. N. Bartlett, "A non-stop operating system," in Proc. 11th Hawaii Int. Conf. Syst. Sci., Jan. 1978. [21] P. A. Bernstein, D. W. Shipman, and J. B. Rothnie, Jr., "Concurrency control in a system for distributed databases (SDD-1)," ACM Trans. Database Syst., vol. 5, no. 1, pp. 18-25, Mar. 1980. [22] P. Bernstein and N. Goodman, "Concurrency control in distributed database systems," ACM Comput. Surveys, vol. 13, no. 2, June 1981. [23] A. Birrell, R. Levin, R. Needham, and M. Schroeder, "Grapevine: An exercise in distributed computing," Commun. ACM, vol. 25, pp. 260-274, Apr. 1982. [24] S. H. Bokhari, "Dual processor scheduling with dynamic reassignment," IEEE Trans. Software Eng., vol. SE-5, no. 4, July 1979. [25] R. M. Bryant and R. A. Finkel, "A stable distributed scheduling algorithm," in Proc. 2nd Int. Conf. Distrib. Comput. Syst., Apr. 1981. [26] L. Casey and N. Shelness, "A domain structure for distributed computer system," in Proc. 6th ACM Symp. Oper. Syst. Princ., Nov. 1977, pp. 101-108. [27] T. C. K. Chou and J. A. Abraham, "Load redistribution under failure in distributed systems," IEEE Trans. Comput., vol.. C-32, pp. 799-808, Sept. 1983. , "Load balancing in distributed systems," IEEE Trans. Software [28] Eng., vol. SE-8, no. 4, July 1982. [29] W. W. Chu, "Optimal file allocation in a multiple computing system," IEEE Trans. Comput., vol. C-18, pp. 885-889, Oct. 1969. [30] W. W. Chu, L. J. Holoway, M. Lan, and K. Efe, "Task allocation in distributed data processing," IEEE Computer, vol. 13, pp. 57-69, Nov. 1980. [31] D. W. Davies, E. Holler, E. D. Jensen, S. R. Kimbleton, B. W. Lampson, G. LeLann, K. J. Thurber, and R. W. Watson, Distributed Systems -Architecture and Implementation, Vol. 105, Lecture Notes in Computer Science. Berlin: Springer-Verlag, 1981. [32] J. Dion, "The Cambridge file server," ACM Oper. Syst. Rev., Oct. 1980. [33] P. Dolev, "The Byzantine generals strike again," J. Algorith., vol. 3, no. 1, 1982. [34] C. Dwork and D. Skeen, "The inherent cost of nonblocking commitment," Dep. Comput. Sci., Cornell Univ., Ithaca, NY, Tech. Rep., May 1983. [35] K. Efe, "Heuristic models of task assignment scheduling in distributed systems," IEEE Computer, vol. 15, June 1982. [36] P. Enslow, "What is a distributed data processing system," IEEE Computer, vol. 11, Jan. 1978. [37] P. Enslow and T. Saponas, "Distributed and decentralized control in a IEEE TRANSACTIONS ON COMPUTERS, VOL.C-33, NO. 12, DECEMBER 1984 1114 [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [561 [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] fully distributed processing system," Tech. Rep. GIT-ICS-81/82, Sept. 1980. D. J. Farber et al., "The distributed computer system," in Proc. 7th Annu. IEEE Comput. Soc.Int. Conf., Feb. 1973. W. D. Farmer and E. E. Newhall, "An experimental distributed switching system to handle bursty computer traffic," in Proc. ACM Symp. Probl. Opt. Data Commun. Syst., 1969. R. A. Floyd and C. S. Ellis, "The ROE file system," in Proc. 3rd Symp. Reliability Distrib. Software Database Syst., Oct. 1983. H. C. Forsdick, R. E. Schantz, and R. H. Thomas, "Operating systems for computer networks," IEEE Computer, vol. 11, Jan. 1978. A. G. Fraser, "Spider An experimental data communications system," Bell Labs., Tech. Rep., 1975. M. Fridrich and W. Older, "The FELIX file server," in Proc. 8th Symp. Oper. Syst. Princ. (SIGOPS), Dec. 1981, pp. 37-44. R. Gallager, "A minimum delay routing algorithm using distributed computation," IEEE Trans. Commun., vol. COM-25, Jan. 1977. J. Garcia-Molina, "Reliability issues for fully replicated distributed databases," IEEE Computer, vol. 16, pp. 34-42, Sept. 1982. D. Gifford, "Weighted voting for replicated data," in Proc. 7th Symp. Oper. Syst. Princ., Dec. 1979, pp. 150-159. , "Violet: An experimental decentralized system," Oper. Syst. Rev., vol. 13, no. 5, Dec. 1979. V. D. Gligor and S. H. Shattuck, "On deadlock detection in distributed systems," IEEE Trans. Software Eng., vol. SE-6, no. 5, pp. 435-440, Sept. 1980. J. N. Gray, R. A. Lorie, and G. R. Putzolu, "Granularity of locks in a shared database," in Proc.Int. Conf. Very Large Database, Sept. 1975, pp. 428-451. J. N. Gray, "Notes on data base operating systems," in Operating Systems: An Advanced Course. Berlin: Springer-Verlag, 1979. , "The transaction concept: Virtue and limitations," in Proc.Int. Conf. Very Large Database, Sept. 1981, pp. 144-154. M. Guillemont, "The chorus distributed operating system: Design and implementation," in Proc. Int. Symp. Local Comput. -Networks, Florence, Italy, Apr. 1982. J. Hamilton, "Functional specification of the WEB kernel," DEC RD Group, Maynard, MA, Nov. 1978. Y. -C. Ho, "Team decision theory and information structures," Proc. IEEE, vol. 68, June 1980. R. A. Jarvis, "Optimization strategies in adaptive control: A selective survey," IEEE Trans. Syst., Man, Cybern., vol. SMC-5, Jan. 1975. E. D. Jensen, "The Honeywell experimental distributed processor- An overview of its objective, philosophy and architectural facilities," IEEE Computer, vol. 11, Jan. 1978. E. D. Jensen and N. Pleszkoch, "ArchOs: A physically dispersed operating system," Distrib. Processing Tech. Comm. Newsletter, Summer 1984. A. K. Jones, "The object model: A conceptual tool for structuring software," in Lecture Notes in Computer Science, Vol. 60. Berlin: Springer-Verlag, 1978. A. K. Jones, R. J. Chansler,l, Durhan, K. Schwans, and S. R. Vegdahl, "StarOS, A multiprocessor operating system for the support of task forces," in Proc. 7th Symp. Oper. Syst. Princ., Dec. 1979. K. C. Kahn et al., "iMax: A multiprocessor operating system for an object-based computer," in Proc. 8th Symp. Oper. Syst. Princ., Dec. 1981, pp. 14-16. S. R. Kimbelton, H. M. Wood, and M. L. Fitzgerald, "Network operating systems -An implementation approach," in Proc. AFIPS Conf., 1978. W. Kohler, "A survey of techniques for synchronization and recovery in decentralized computer systems," ACM Comput. Surveys, vol. 13, no. 2, June 1981. H. T. Kung and J. T. Robinson, "On optimistic methods for concurrency control," ACM Trans. Database Syst., vol. 6, no. 2, June 1981. L. Lamport, "Time, clocks, and the ordering of events in a distributed system," ACM, July 1978. L. Lamport, R. Shostak, and M. Pease, "The Byzantine generals problem," ACM Trans. Programming Lang. Syst., vol. 4, no. 3, July 1982. B. Lampson, "Atomic transactions," in Lecture Notes in Computer Science, Vol. 105, B. W. Lampson, M. Paul, and H. J. Siegert, Eds. Berlin: Springer-Verlag, 1980, pp. 365-370. R. E. Larsen, Tutorial: Distributed Control, IEEE Catalog No. EHO 153-7, New York: IEEE Press, 1979. E. Lazowska, H. Levy, G. Almes, M. Fischer, R. Fowler, and S. Vestal, "The Architecture of the Eden System," in Proc. 8th Annu. Symp. Oper. Syst. Princ., Dec. 1981. [69] G. LeLann, "Algorithms for distributed data-sharing systems which use tickets," in Proc. 3rd Berkeley Workshop Distrib. Databases Comput. Networks, 1978. [70] ,"A distributed system for real-time transaction processing," Computer, vol. 14, Feb. 1981. P. H. Levine, "Facilitating interprocess communication in a heterogeneous network environment," Masters thesis, Massachusetts Inst. Technol., Cambridge, MA, June 1977. [72] B. Lindsay, "Object naming and catalogue management for a distributed database manager," IBM Res. Rep. RJ2914, Aug. 1980. [73] B. Liskov and R. Scheifler, "Guardians and actions: Linguistic support for robust, distributed programs," in Proc. 9th Symp. Princ. Programming Lang., Jan. 1982, pp. 7-19. M. T. Liu, D. Tsay, C. Chou, and C. Li, "Design of the distributed double-loop computer network (DDLCN)," J. Digital Syst., vol. V, no. 12, 1981. [75] G. W. R. Luderer et al., "A distributed UNIX system based on a virtual circuit switch," in Proc. 8th Symp. Oper. Syst. Princ., Dec. 1981. [76] J. R. McGraw and G. R. Andrews, "Access control in parallel programs," IEEE Trans. Software Eng., vol. SE-5, Jan. 1979. for M. S. McKendry, J. E. Allchin, and W. C. Thibault, "Architecture system," in Proc. IEEE INFOCOM, Apr. 1983. global [78] J. M. McQuillan and D. C. Walden, "The ARPA network design decisions," Comput. Networks, vol. 1, Aug. 1977. [79] J. M. McQuillan, I. Richer, and E. C. Rosen, "The new routing algorithm for the ARPANET," IEEE Trans. Commun., vol. COM-28, May 1980. [80] A. Meijer and P. Peeters, Computer Network Architectures. Rockville, MD: Computer Science Press, 1982. [81] P.M. Melliar-Smith and R. L. Schwartz, "Formal specification and IEEE Trans. Comput., vol. C-13, mechanical verification of July 1982. D. A. Menasce and R. R. Muntz, "Locking and deadlock detection in distributed data bases," IEEE Trans. Software Eng., vol. SE-5, no. 3, May 1979. [83] P. M. Merlin and A. Segall, "A failsafe distributed routing protocol," IEEE Trans. Commun., vol. COM-27, Sept. 1979. R. M. Metcalf and D. Boggs, "Ethernet: Distributed packet switching for local computer networks," Commun. ACM, vol. 19, July 1976. [85] J. G. Mitchel and J. Dion, "A comparison of two network-based file servers," Commun. ACM, vol. 25, pp. 233-245, Apr. 1982. IEEE [71] [74] [77] operating SIFT," [82] [84] [86] J. E. B. Moss, "Nested transactions and reliable distributed computing," in Proc. 2nd Symp. Reliability Distrib. Software Database Syst., July 1982. [87] R. M. Needham and A. J. Herbert, The Cambridge Distributed Computing System. London: Addison-Wesley, 1982. "Remote procedure call," Xerox Corp., Tech. Rep. CSL[88] B.81-9,J. Nelson, May 1981. agent [89] D. Oppen and Y. K. Dalal, "The clearinghouse: A decentralized for locating named objects in a distributed environment," Xerox Corp., Office Products Div. Rep. OPD-T8103, Oct. 1981. [90] J. Ousterhout, D. Scelza, and P. Dindhu, "Medusa: An experiment in distributed operating system structure," Commun. ACM, vol. 23, Feb. 1980. can data loops go," IEEE Trans. Commun., [91] J. Pierce, "HowJunefar1972. vol. COM-20, dis[92] G. Popek et al., "LOCUS, A network transparent, high reliability1981, tributed system," in Proc. 8th Symp. Oper. Syst. Princ., Dec. pp. 14-16. [93] L. Pouzin, "Presentation and major design aspects of the Cyclades network," in Proc. 3rd ACM Data Commun. Conf., computer Nov. 1973. M. L. Powell and B. P. Miller, "Process migration in DEMOS/MP," in [94] Proc. 9th Symp. Oper. Syst. Princ., Oct. 1983. in [95] K. Ramamritham and J. A. Stankovic, "Dynamic task scheduling distributed hard real-time systems," IEEE Software, vol. 1, no. 3, July 1984. [96] B. Randell, "Recursively structured distributed computing systemns," in Proc. 3rd Symp. Reliability Distrib. Software Database Syst., Oct. 1983. [971 R. Rashid, "An inter-process communication facility for UNIX," Carnegie-Mellon Univ., Pittsburgh, PA, Tech. Rep., June 1980.oriented [98] R. F. Rashid and G. G. Robertson, "Accent: A communication network operating system kernel," in Proc. 8th Symp. Oper. Syst. Princ., Dec. 1981. [99] D. R. Ries and M. R. Stonebraker, "Locking granularity revisited," ACM Trans. Database Syst., pp. 210-227, June 1979. 1115 STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS [100] D. J. Rosenkrantz, R. E. Stearns, and P.M. Lewis, "System level concurency control for distributed database systems," ACM Trans. Database Syst., vol. 3, no. 2, June 1978. [101] J. B. Rothni, Jr., P. A. Bernstein, S. Fox, N. Goodman, M. Hammer, T. A. Landers, C. Reeve, D. W. Shipman, and E. Wong, "Introduction to a system for distributed databases (SDD-1)," ACM Trans. Database Syst., vol. 5, no. 1, pp. 1-17, Mar. 1980. [1021 L. A. Rowe and K. P. Birman, "A local network based on the UNIX operating system," IEEE Trans. Software Eng., vol. SE-8, no. 2, Mar. 1982. [103] J. H. Saltzer, "Naming and binding of objects," Operating Systems: An Advanced Course. Berlin: Springer-Verlag, 1978. [104] J. H. Saltzer, D. P. Reed, and D. D. Clark, "End-to-end arguments in system design," in Proc. 2nd Int. Conf. Distrib. Comput. Syst., Apr. 1981. [105] N. Sandell, P. Varaiya, M. Athans, and M. Safonov, "Survey of decentralized control methods for large scale systems," IEEE Trans. Auto. Cont., vol. AC-23, no. 2, Apr. 1978. [106] A. Segall, "The modelling of adaptive routing in data-communication networks," IEEE Trans. Commun., vol. COM-25, no. 1, pp. 85-95, Jan. 1977. [107] D. G. Severance and G. M. Lohman, "Differential files: Their application to the maintenance of large databases," ACM Trans. Database Syst., vol. 1, no. 3, Sept. 1976. [108] S. K. Shrivastava and F. Panzieri, "The design of a reliable remote procedure call mechanism," IEEE Trans. Comput., vol. C-31, July 1982. [109] D. Siewiorek and R. Swarz, The Theory and Practice of Reliable System Design. Bedford, MA: Digital, 1982. [110] D. Skeen, "Nonblocking commit protocols," in Proc. ACM SIGMOD, 1981. [111] , "A decentralized termination protocol," in Proc. I st IEEE Symp. Reliability Distrib. Software Database Syst., 1981. [112] D. Skeen and M. Stonebraker, "A formal model of crash recovery in a distributed system," IEEE Trans. Software Eng., vol. SE-9, no. 3, May 1983. [113] G. R. Smith, "The contract net protocol: High level communication and control in a distributed problem solver," IEEE Trans. Comput., vol. C-29, Dec. 1980. [114] M. H. Solomon and R. A. Finkel, "The Roscoe distributed operating system," in Proc. 7th Symp. Oper. Syst. Princ., Mar. 1979. [115] A. Z. Spector, "Performance remote operations efficiently on a local computer network," Commun. ACM, vol. 25, pp. 246-259, Apr. 1982. [116] A. Z. Spector and P.M. Schwarz, "Transactions: A construct for reliable distributed computing," ACM Oper. Syst. Rev., vol. 17, no. 2, Apr. 1983. [117] S. K. Srivastava, "On the treatment of orphans in a distributed system," in Proc. 3rd Symp. Reliability Distrib. Syst., Oct. 1983. [118] W. Stallings, Local Networks. New York: Macmillan, 1984. [119] J. A. Stankovic, "The types and interactions of vertical migrations of functions in a multi-level interpretive system," IEEE Trans. Comput., vol. C-30, July 1981. [120] , "Improving system structure and its affect on vertical migration," Microprocessing and Microprogramming, vol. 8, no. 3,4,5, Dec. 1981. [121] , "ADCOS-An adaptive, system-wide, decentralized controlled operating system," Univ. Massachusetts, Amherst, MA, Tech. Rep. ECE-CS-81-2, 1981. [122] -, "Software communication mechanisms: Procedure calls versus messages," IEEE Computer, vol. 15, Apr. 1982. [123] , "Simulations of three adaptive decentralized controlled, job scheduling algorithms," Comput. Networks, vol. 8, no. 3, pp. 199-217, June 1984. [124] , "Bayesian decision theory and its application to decentralized control of job scheduling," IEEE Trans. Comput., vol. C-34, Jan. 1985, to be published. [125] J. A. Stankovic and 1. S. Sidhu, "An adaptive bidding algorithm for processes, clusters and distributed groups," in Proc. 4th Int. Conf. Distrib. Comput., May 1984. [126] J. A. Stankovic, K. Ramamritham, and W. Kohler, "Current research and critical issues in distributed system software," Dep. Elec. Comput. Eng., Univ. Massachusetts, Amherst, MA, Tech. Rep., 1984. [127] H. S. Stone, "Multiprocessor scheduling with the aid of network flow algorithms," IEEE Trans. Software Eng., vol. SE-3, Jan. 1977. [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] , "Critical load factors in distributed computer systems," IEEE Trans. Software Eng., vol. SE-4, May 1978. H. S. Stone and S. H. Bokhari, "Control of distributed processes," IEEE Computer, vol. 11, pp. 97-106, July 1978. M. Stonebraker and E. Neuhold, "A distributed database version of INGRES," in Proc. 1977 Berkeley Workshop Distrib. Data Management Comput. Networks, pp. 19-36. M. Stonebraker, "Operating system support for database management," Commun. ACM, vol. 24, pp. 412-418, July 1981. H. Sturgles, J. Mitchell, and J. Isreal, "Issues in the design and use of distributed file system," ACM Oper. Syst. Rev., July 1980. D. Swinehart, G. McDaniel, and G. Boggs, "WFS: A simple shared file system for a distributed environment," in Proc. 7th Symp. Oper. Syst. Princ., Dec. 1979. A. S. Tanenbaum, Computer Networks. Englewood Cliffs, NJ: Prentice-Hall, 1981. R. R. Tenny and N. R. Sandell, Jr., "Structures for distributed decisionmaking," IEEE Trans. Syst., Man, Cybern., vol. SMC-11, pp. 517-527, Aug. 1981. , "Strategies for distributed decision-making," IEEE Trans. Syst., Man, Cybern., vol. SMC-11, pp. 527-538, Aug. 1981. R. H. Thomas, "A majority consensus approach on concurrency control for multiple copy databases," ACM Trans. Database Syst., vol. 4, no. 2, pp. 180-209, June 1979. D. Tsay and M. Liu, "MIKE: A network operating system for the distributed double-loop computer network," IEEE Trans. Software Eng., vol. SE-9, no. 2, Mar. 1983. A. van Dam, and J. Michel, "Experience and distributed processing on a host/satellite graphics system," in Proc. SIGGRAPH, July 1976. B.G. Walker, G. Popek, R. English, C. Kline, and G. Theil, "The LOCUS distributed operating system," in Proc. 9th Symp; Oper. Syst. Princ., Oct. 1983. S. Ward, "TRIX a network oriented operating system," in Proc. COMPCON, 1980. M.V. Wilkes and R.M. Needham, The Cambridge CAP Computer and its Operating System. Amsterdam, The Netherlands: Elsevier North-Holland, 1979. L. Wittie and A. M. Van Tilborg, "MICROS, A distributed operating system for micronet, A reconfigurable network computer," IEEE Trans. Comput., vol. C-29, Dec. 1980. W. Wulf, E. Cohen, W. Corwin, A. Jones, R. Levin, C. Pierson, and F. Pollack, "HYDRA: The kernel of a multiprocessor operating system," Commun. ACM, vol. 17, June 1974. John A. Stankovic (S'77-M'79) received the Sc.B. degree in electrical engineering in 1970, and the Sc.M. and Ph.D. degrees in computer science in 1976 and 1979, respectively, all from Brown University, Providence, RI. He is now an Associate Professor in the Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA. He has been active in distributed systems research since 1976. His current research includes various ap- proaches to process scheduling on loosely coupled networks and recovery protocols for distributed databases. He has been involved in CARAT, a distributed systems testbed project at the University of Massachusetts. Prof. Stankovic was coeditor of the January 1978 Special Issue of IEEE Computer on Distributed Processing. He now serves as the Vice Chairman of the IEEE Technical Committee on Distributed Operating Systems. In this capacity he has been responsible for serving as the Editor of two Special Issues of the Technical Committee's Newsletter. He received the 1983 Outstanding Junior Faculty Award for the School of Engineering, University of Massachusetts. He is a member of ACM and Sigma Xi.