1102
IEEE TRANSACTIONS ON COMPUTERS, VOL.
C-33,
NO.
12,
DECEMBER
1984
A Perspective on Distributed Computer Systems
JOHN A. STANKOVIC,
MEMBER, IEEE
(Invited Paper)
Abstract - Distributed computer systems have been the subject
of a vast amount of research. Many prototype distributed computer systems have been built at university, industrial, commercial, and government research laboratories, and production
systems of all sizes and types have proliferated. It is impossible to
survey all distributed computing system research. Instead, this
paper identifies six fundamental distributed computer system research issues, points out open research problems in these areas,
and describes how these six issues and solutions to problems associated with them transect the communications subnet, the distributed operating system, and the distributed database areas. It is
intended that this perspective on distributed computer system
research serve as a form of survey, but more importantly to illustrate and encourage a better integration and exchange of ideas
from various subareas of distributed computer system research.
Index Terms -Communications subnet, computer networks,
distributed computer systems, distributed databases, distributed
operating systems, distributed processing, system software.
I. INTRODUCTION
A DISTRIBUTED computer system (DCS) is a collection
A of processor-memory pairs connected by a communications subnet and logically integrated in varying degrees
by a distributed operating system and/or distributed database system. The communications subnet may be a widely
geographically dispersed collection of communication processors or a local area network. The widespread use of distributed computer systems is due to the price-performance
revolution in microelectronics the development of cost effective and efficient communication subnets (which is itself
due to the merging of data communications and computer
communications), the development of resource sharing software, and the increased user demands for communication,
economical sharing of resources, and productivity.
A DCS potentially provides significant advantages, including good performance, good reliability, good resource
sharing, and extensibility [31], [36], [56]. Potential performance enhancement is due to multiple processors and
an efficient subnet, as well as avoiding contention and
bottlenecks that exist in uniprocessors and multiprocessors.
Potential reliability improvements are due to the data and
control redundancy possible, the geographical distribution of
the system, and the ability for mutual inspection of hosts
and communication processors. With the proper subnet and
distributed operating system, it is possible to share hardware
and software resources in a cost effective manner increasing
productivity and lowering costs.
Manuscript received May 7, 1984; revised July 14, 1984.
The author is with the Department of Electrical and Computer Engineering,
University of Massachusetts, Amherst, MA 01003.
Possibly the most important potential advantage of a DCS
is extensibility. Extensibility is the ability to easily adapt to
both short and long term changes without significant disruption of the system. Short term changes include varying
workloads and subnet traffic, and host or subnet failures or
additions. Long term changes are associated with major
modifications to the requirements of the system.
In trying to achieve the advantages of DCS,'s, the scope of
research has been very broad. In spite of this, there is a
relatively small number of fundamental issues dominating
the field. Solutions to these fundamental issues have not yet
been consolidated in a comprehensive way, thereby thwarting the full potential of DCS's. After a brief overview of DCS
research (Section II), this paper provides a perspective on six
fundamental DCS issues (the object model, access control,
distributed control, reliability, heterogeneity, and efficiency), identifies problems associated with these issues,
shows how these issues interrelate, and describes how they
are addressed in different subareas of DCS research
(Section III). It is intended that this perspective on DCS
research serve as a form of survey, but more importantly to
illustrate and encourage a better integration and exchange of
ideas from various subareas of DCS research. To keep the
scope of this paper reasonable, two fundamental issues, research in the theory and specification of distributed systems,
and the need for a distributed systems methodology are not
specifically discussed. A theory of distributed systems is
needed to better understand theoretical limitations and complexity. Specification languages must be extended to better
treat parallelism, reliability, the distributed nature of the system being specified, and the correctness of the system. A
methodology for the design, construction, and maintenance
of large complex distributed systems is necessary. This methodology must address the specific problems of DCS's such as
distribution and parallelism. Finally, Section IV contains
summary remarks.
II. DISTRIBUTED COMPUTER SYSTEMS
DCS research encompasses many areas, including: the
communication subnet, local area networks, distributed
operating systems, distributed databases, concurrent and distributed programming languages, specification languages for
concurrent systems, theory of parallel algorithms, parallel
architectures and interconnection structures, fault tolerant
and ultrareliable systems, distributed real-time systems, cooperative problem solving techniques of artificial intelligence, distributed debugging, distributed simulation, and
distributed applications [23], [47], [89]. There are also the
0018-9340/84/1200-1102$01.00 © 1984 IEEE
STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS
associated efforts of modeling and analysis, testbed development, measurement and evaluation, and prototype implementations. An extensive survey and bibliography would
require hundreds of pages. In this section we concentrate on
briefly categorizing and identifying actual distributed computer systems, rather than all related research. At the end of
this paper there is a list of textbooks, survey articles, and
references to provide further information on these research
areas. For a more extensive survey of DCS research issues
see [126].
Strictly speaking, a DCS is the sum total of the physical
network and all the controlling software. Therefore, each
category discussed below (local area networks, wide area
networks, network operating systems, distributed operating
systems, distributed file servers, distributed real-time systems, and distributed databases) actually refers to a particular
aspect of a DCS and not an entire DCS.
Local Area Networks: According to Stallings [ 118] "a local network is a communications network that provides interconnection of a variety of data communicating devices within
a small area." A small area generally refers to a single building or possibly spanning several buildings. A network with a
radius of 20 km would border between a local network and a
long haul network. Local networks are sometimes classified
into three types: a local area network (LAN), a high speed
local network (HSLN) which is typically found in a large
computer center, or a digital switch/computerized branch exchange (CBX). Typical LAN's are the ring, the baseband
bus, and the broadband bus. HSLN and CBX's are not discussed in this paper.
Common protocols for accessing ring LAN's are the Newhall (token) protocol [38], [39], the IEEE 802 token ring
protocol [118], the Pierce protocol (the slotted ring) [91], or
the delay insertion protocol [74]. Prime and Apollo [15]
manufacture token rings; the Cambridge ring [87] and
Spider [72] are examples of slotted rings; and DDLCN [74]
is a delay insertion ring.
A baseband network refers to transmission of signals without modulation and the entire spectrum of the medium (cable)
is consumed by the signal. Baseband LAN's are typically
accessed via a carrier sensed multiaccess/collision detect
(CSMA/CD) protocol commonly referred to as Ethernet
[84]. DEC, Intel, and Xerox support a product which runs the
Ethernet protocol.
Broadband LAN's use frequency division multiplexing
and divide the spectrum of the medium (twisted pair, coaxial
cable, or fiber optics) into channels each of which carries
analog signals or modulated digital data. For example, some
channels may be used for point-to-point data communication,
other channels can utilize a contention protocol such as
Ethernet, other channels may be assigned for video traffic,
and yet other channels may be dedicated to voice traffic.
Wang manufactures a broadband LAN.
Wide Area Networks: A wide area network (WAN) is a
geographically dispersed collection of hosts and communication processors where the distances involved are large.
Another common name for these networks is long haul networks. WAN's include ARPANET [78], Cyclades [93],
1103
TYMNET, and Telenet. Commercial vendors and several
standards organizations have developed computer network
architectures. A network architecture describes the functionality of a computer network in a structured and layered
manner. The ISO reference niodel [134] is typical of these
architectures and it contains seven layers: the physical,
data link, network, transport, session, presentation, and
application layers. Some examples of commercial network
architectures which are used in a wide variety of applications include: Burrough's Network Architecture, DEC's
DECNET, Honeywell's Distributed Systems Architecture,
IBM's Systems Network Architecture, Siemans' TRANSDATA, and Univac's Distributed Communication Architecture [80]. Strictly speaking, network architectures exist on
both LAN's and WAN's, but they are listed here because they
were first conceived within the WAN context.
Network Operating Systems: Consider the situation where
each of the hosts of a computer network has a local operating
system that is independent of the network. The sum total
of all the operating system software added to each host in
order to communicate and share resources is called a network
operating system (NOS). The added software often includes
modifications to the local operating system. NOS's are characterized by being built on top of existing operating systems,
and they attempt to hide the differences between the underlying systems. The most famous example of such a computer
network is ARPANET and it contains several NOS's, e.g.,
RSEXEC and NSW [71]. RSEXEC includes a uniform
command language interpreter and a networkwide execution
environment for user programs. It is intended as an experimental vehicle for exploring NOS issues. NSW is a NOS
that supports software development by providing a uniform
access to a set of software tools. A systemwide file system is
a major element of NSW. XNOS [61] is another example of
a NOS.
Distributed Operating Systems: Consider an integrated
computer network where there is one native operating system
for all the distributed hosts. This is called a distributed operating system (DOS). A DOS is designed with the network
requirements in mind from its inception and it tries to manage
the resources of the network in a global fashion. Therefore,
retrofitting a DOS to existing operating systems and other
software is not a problem in DOS's. Since DOS's are used to
satisfy a wide variety of requirements their various implementations are quite different. Table I lists a number of
DOS's. Note that the boundary between NOS's and DOS's is
not clearly distinguishable.
Distributed File Servers: A file system is an integral part
of a NOS, a DOS, and a distributed database system. Hence,
there has been a great deal of research in this area.- In most
distributed systems the file system is considered a server
which fields requests from users as well as from the rest of the
operating system. File servers support the illusion that there
is a single logical file system when, in fact, there may be
many different file systems. Depending on the level of
sophistication implemented, the file server may support replication, movement of files, and reliable updates to files in
addition to the common file system commands. Typical file
1104
IEEE TRANSACTIONS ON COMPUTERS, VOL. C-33, NO. 12, DECEMBER 1984
TABLE I
DISTRIBUTED OPERATING SYSTEMS
ACCENT
ADCOS
AEGIS
ArchOS
[98]
[121]
[15]
[57]
[87]
CAP
CHORUS
[52]
CLOUDS
[77]
DCS
[38]
DEMOS/MP [94]
Domain
Structure [26]
HXDP
LOCUS
Medusa
MICROS
[56]
[921
[90]
[143]
MIKE
[138]
RIG
[18]
ROSCOE [114]
STAROS [59]
TRIX
[148]
defined experiments run on these systems would provide
valuable input to all researchers in their attempts to formulate
solutions, models, and a general methodology and theory for
distributed systems. On the other hand, even though these
current systems do not theoretically achieve the full
"potential" benefits of distributed computing, they are important systems in their own right, contributing to the state of
the art and, in general, meeting their own specific requirements quite well.
UNIX (6)a
EDEN
WEB
[53]
[68]
Fully DP
[37]
aThe (6) after UNIX indicates that there are at least six extensions to UNIX
for distributed systems including LOCUS [92], [97], [140], and those extensions done at Bell Laboratories [75], Berkeley [102], Purdue, and Newcastle
upon Tyne [96].
servers are: the Cambridge file server [32], DFS [132], the
Felix file server [43], ROE [40], Violet [46], and WFS [ 133].
See [85] for a comparison of the Xerox DFS file server and
the Cambridge file server.
Distributed Real-Time Systems: Nuclear power plants and
process control applications are inherently distributed and
have severe real-time constraints and reliability requirements. These constraints add considerable complication to
a DCS. Airline reservation and bahking applications are
also distributed, but have less severe real-time and reliability constraints and are easier to build. Examples of the
more demanding real-time systems include ESS [19],
REBUS [16], and SIFT [81]. ESS is a software controlled
electronic switching system developed by the Bell System for
placing telephone calls. The system meets severe real-time
and reliability requirements. REBUS is a fault tolerant distributed system for industrial real-time control, and SIFT is
a fault tolerant flight control system.
DistributedDatabases: An important application of DCS's
is to allow databases (in contrast to a file system) to be
geographically distributed across the network while at the
same time being logically integrated. This distribution often
follo,ws some natural distribution, such as the recQrds of
airline reservations being at the site of plane departures, or a
company's inventory, personnel, orders, and production
records residing at the appropriate locations. Increased reliability and performance can also be attained with a distributed database. The best known distributed database systems
are Distributed Ingres [127], R* [72], and SDD-1 [21],
[101]. Distributed Ingres, developed at the University of
California at Berkeley, is based on the relational model
and as its name suggests is an extension of Ingres. R* is
also based on a relational model and is an IBM research
project. SDD-1 has been built by Computer Corporation of
America in a layered architecture and is specifically designed
for reliability.
This list of actual DCS's serves to outline the various
research areas that we consider in a combined manner in the
remainder of the paper. It should be noted that many of the
systems referenced in this section are experimental systems,
each grappling with the fundamental research issues discussed in the next section. Performance data concerning well
III. FUNDAMENTAL ISSUES
This section provides a perspective on six interrelated DCS
issues: the object model, access control, distributed control,
reliability, heterogeneity, and efficiency. These fundamental
issues are described, problems associated with these issues
are identified, the relative importance of the issues is discussed, and the issues are viewed as topics which span the
subnet, operating system, and database levels. It is suggested
that solutions from each of these areas can be better utilized
in all areas.
A. The Object Model
A data abstraction is a collection of information and a set
of operations defined on that information. An object is an
instantiation of a data abstraction. The concept of an object
is usually supported by a kernel which may also define a
primitive set of objects. Higher level objects are then constructed from more primitive objects in some structured fashion. All hardware and software resources of a DCS can be
regarded as objects. The concept of an objec-t and its implications form an elegant basis for a DCS [59], [60], [103],
[144]. For example, a process can be considered as an
execution trace through a dynamically changing collection of
objects. All objects required by a process at some instance of
time is called a domain [26]. This includes code and environment objects such as process control blocks and file control
blocks. The necessary set of information needed when
moving a process is then clearly delineated by the domain.
There are other complications of course because some of the
objects needed by a process may be owned entirely by this
process, others may be simultaneously shareable with other
processes, and yet others may be shared serially. The elegance of the object model is also witnessed in naming,
addressing, and accessing objects which can all be handled
by "system" objects. Individual objects communicate with
each other via well defined communication primitives (such
as SEND/WAIT and SEND/NOWAIT) [122].
The idea of an object as described above is an abstraction
that is independent of the distribution issues found in DCS's.
It is almost trivial to extend the object abstraction to distributed systems. Typical distributed systems functions such as
(static and dynamic) allocation of objects to a host, moving
objects, remote access to objects, sharing objects across the
network, and providing interfaces between disparate objects
are all "conceptually" simple because they are all handled by
yet other objects. The object concept is powerful and most
DOS's listed in Section II are based on objects.
1105
STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS
In DCS's, complications arise when it is time to decide
how an object is to perform its function. In many cases,
though, this problem is reduced if an object model has been
used. Consider a global scheduler that decides to move a
process to another host on the network because it predicts
improved response time. The fact that all the resources required' by that process at a given time is an easily detectible
collection of objects (a domain) facilitates this movement. As
another example of the advantages of objects, consider a
centralized name server that is causing a performance bottleneck and reliability problems. Reprogramming the name
server in a distributed fashion might relieve these problems
without affecting users of the name server object because to
users of the name server the interface has remained the same.
Objects also serve as the primitive entity in more complicated structuring of distributed computation. One type of
distributed computation is a process which executes as a
sequential trace through objects, but does so across multiple
hosts (Fig. 1). Work such as [26],[127]-[129], [139] has
been based on this definition. Another form of a distribute'd
computation is to have clusters of processes (collections of
objects) running in parallel and communicating with each
other in any number of ways based on the type of interprocess
communication (IPC) used. The cluster of processes may be
colocated or distributed in some fashion (Fig. 2). As an example, this form of computation is used in [59], [90], [125].
Other notions of wh4t constitutes a distributed program are
possible, but most are also based on an object model.
The major problem with object based systems has been
poor execution time performance. However, this is not really
a problem with the object abstraction itself, but is a problem
with inefficient implementation of access to objects. The
most common reason given for poor execution time is that
current architectures are ill suited for object based systems.
Another problem is choosing the right granularity for an
object [49], [99]. If every integer or character and their associated operations are treated as objects, then the overhead is
too high. If the granularity qf an object is too large, then the
benefits of the object based system are lost. Needed are better
architectures, better performance analyses to choose good
object granularity, and the ability to develop object based
systems where compilers automatically produce optimized
machine code avoiding object overhead when feasible [120].
The object model is considered a fundamental issue of
DCS research because it is a convenient primitive for DCS's,
simplifying design, implementation, and extensibility. Further, all of the other fundamental DCS issues can be discussed in terms of objects. This is illustrated throughout the
remainder of the paper.
B. Access Control
A distributed system contains a collection of hardware and
software resources each of which can be modeled as objects.
Accessing the resources [76] must be controlled in two ways.
One, the manner used to access the resource, must be suitable
to the resource and requirements under consideration. For
example, a printer or bus must be shared serially (mutual
exclusion), local data of an object should be unshared, a read
Host 2
hlost 1
Host 3
A Distributed Program = P_A + P_B + P_C.
Numbers 1-5 indicate the order of the sequential
trace through the program.
Fig. 1. Distributed computation_1.
P6 are processes (collections of objects and an execution
trace through some objects) running in parallel.
P1-
O
SEND/WAIT Communication
Q
SEND/NOWAIT Communication.
Note:
Another variation would permit a process, say
exist across hosts as in Figure 1.
P4,
to
Fig. 2. Distributed computation_2.
only file (an object) can be accessed simultaneously by any
number of users, and multiple copies of a file must be consistent. Two, access to a resource must be restricted to the set
of allowable users. This is usually done by an access control
list, an access control matrix, or by capabilities. Dynamic
changes to access rights of a user cause some difficulties, but
solutions do exist [58]. In the remainder of this section we
expound upon the first aspect of access control because it is
more difficult and interesting than restricting the set of allowable users.
Serial sharing of a single resource can be enforced in any
number of ways including locking. However, in LAN's and
WAN's we find many other interesting techniques. For
example, serial access to a ring'can be enforced by using a
token protocol. Time division multiplexing is a static serial
sharing of the communication medium. Polling is a technique
where 'a central host queries stations, one at a time, as to
whether they have traffic to send, thereby enforcing serial
sharing. Reservation schemes, a type of collision free proto-
106
col, require stations to reserve future time slots on the communication medium to completely avoid conflict. Another
class of access protocols, called limited contention protocols,
allows contention for serial use of the bus under light loads
to reduce delays, but essentially becomes a collision free
protocol under heavy loads, thereby increasing channel
efficiency. See [134] for a full description of collision free
and limited contention protocols.
We believe that most ideas from the subnet research area
can be better exploited in other parts of DCS's. For example,
one idea which has already been used is a "virtual" token
scheme [70] for concurrency control in a distributed database
system. The token circulates on a virtual ring and carries with
it a sequencer that delivers sequential and unique integer
values called tickets. This scheme insures serial access to
data items by allowing access to an object only if the requestor holds the proper ticket. Another potential area for
sharing ideas is in distributed time critical applications where
such applications could make use of reservation and limited
contention protocols in controlling access to objects. This
approach would better guarantee access within the time
constraints.
In addition to resources which must be shared serially,
DCS's contain resources which can be shared simultaneously. Resources that can be shared simultaneously pose no
difficulty if accessed individually. However, it is sometimes
necessary to access a group of resources (objects) at the same
time. This gives rise to transactions which are used extensively in the database area. A transaction is an abstraction
which allows programmers to group a sequence of actions
into a logical unit. If executed atomically (either the entire
transaction is executed or none of it is executed), a transaction transforms a current consistent state of the resources
into a new consistent state. The virtues of transactions are
well documented [511, [66] and are important enough that
transactions have also appeared in distributed programming
languages [73], and in distributed operating systems [77],
[116], [131].
Multiple transactions may have access problems among
themselves. Protocols for resolving data access conflicts between transactions are called concurrency control protocols
[21], [69], [137]. Three major classes of concurrency control
protocols are: locking, timestamp ordering [100], and validation (also called the optimistic approach [63]).
Locking is a well known technique that is already used at
all levels of a system. Timestamp ordering is an interesting
approach where all accesses to data are timestamped and then
some common rule is followed by all transactions in such a
way as to ensure serial access [137]. This technique should be
useful at all levels of a system, especially if timestamps are
already being generated for other reasons such as to detect
lost messages or failed hosts. A variation of the timestamp
ordering approach is the multiversion approach. This
approach is interesting because it integrates concurrency
control with recovery. In this scheme, each change to the
database results in a new version. Concurrency control is
enforced on a version basis and old versions serve as checkpoints. Handling multiple versions efficiently is accom-
IEEE TRANSACTIONS ON
COMPUTERS, VOL. c-33, NO. 12, DECEMBER 1984
plished by differential files [107]. Validation is a technique
which permits unrestricted access to data items but then there
are checks for potential conflicts at the commit point. The
commit point is the time at which a transaction is sure that it
will complete. This approach is useful when few conflicts are
expected.
The correctness of a concurrency control protocol is
usually based on the concept of serializability [22].
Serializability means that the effect of a set of executed
transactions (permitted to run in parallel) must be the same as
some serial execution of that set of transactions. In many
cases and at all levels the strict condition of serializability is
not required. Relaxing this requirement can usually result in
access techniques that are more efficient in the sense of allowing more parallelism and faster execution times. An example of this is the scheduling function. The scheduling
algorithm is running in parallel on many hosts and must make
decisions quickly. The scheduler would rather make quick
decisions using somewhat inconsistent data than have to lock
all the distributed information that it might use in arriving
at a solution. A research problem is determining the value
that the completeness and accuracy of information contributes to the quality of the result (in this case response time
or throughput).
The concurrency control protocols as well as the previously mentioned access control techniques of the subnet are
implemented across the entire spectrum from centralized to
distributed control. Hence, access control is very closely
related to the discussion in the next section on distributed
control.
Further, some access control may be done statically (e.g.,
scope rules in programming languages), but most access control must occur dynamically during system operation. For
example, the data items to be accessed by a process may not
be known prior to execution, and therefore, access control to
these data items must be controlled dynamically. How dynamic access control is done affects the performance and
fairness of the protocol. Normal access must sometimes be
bypassed for external interrupts, for alarms, or for failures.
Careful normal access control coupled with techniques for
failure situations signifies that access control is also highly
related to reliability. A particular access control technique
must resolve deadlock and livelock -two other issues related to performance and reliability [48], [82].
Finally, we point out that objects and access to them are
intimately related and of equal importance. Access control
may be implemented by a distributed control protocol and has
important effects on efficiency and reliability. It is a complicated problem to choose the right access control technique for
a given set and type of objects. Ideas for solutions to the
access control problem from the subnet, NOS, DOS, and
database areas should be interchanged to obtain better performance and reliability.
C. Distributed Control
Depending on the application and requirements, distributed control can take on many various forms. Although research has been active for all types of distributed control, the
STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS
majority of the work is based on extensions to the centralized
state-space model and can be more accurately described as
decomposition techniques [ 135], [136], rather than distributed control. In such work, large scale problems are partitioned into smaller problems, each smaller problem being
solved, for example, by mathematical programming techniques, and the separate solutions being combined via interaction variables. In many cases the interaction variables are
considered negligible and in others they are limited in the
sense that they model very limited cooperation. See
[55], [67], [105] for excellent surveys of these types of distributed control (decomposition). Note that in many of these
cases it is an irrelevant detail that individual subproblems are
solved in parallel on different computers and reliability is
often not an issue. Further, many of the techniques require
extensive computing power making them more suitable for
application programs rather than system functions.
For DCS's, another form of distributed control is also
important; that is, distributed control where decomposition is
not possible to any large degree and where cooperation
among the distributed controllers is the most important aspect. The DCS environment also results in a set of additional
very demanding requirements. For example, some functions
implemented in this environment are dynamic, distributed,
adaptive, asynchronous, and operate in noisy, error prone,
and uncertain (stochastic) situations. Another important aspect of the environment is that significant delays in the movement of data are commonplace and must be accounted for. It
is important to note that the stochastic nature of the system
being controlled affects two distinct aspects of the functions: an individual controller's view of the system is an
estimate, and future random forces can effect the system
independently of the control decision. Scheduling, startup of
multiple distributed processes of a system or application program, and processes which test the reasonableness of distributed data are examples of this form of distributed control. We
will now consider distributed control of this more demanding
type across three levels: the subnet, the DOS, and the distributed database levels.
Functions in the subnet such as access control (discussed
above), routing [44], [106], and congestion control are good
candidates for distributed control. Routing is the decision
process which determines the path a message follows in passing from its source to its destination. Some routing schemes
are completely fixed, others contain fixed alternate paths
where the alternative is chosen only on failures. These nonadaptive schemes are too limited and do not constitute distributed control. Adaptive routing schemes modify routes
based on changing traffic patterns. Adaptive routing schemes
may be centralized where a routing control center calculates
good paths and then distributes these paths to the individual
hosts of the network in some periodic or aperiodic fashion.
This is not distributed control either.
Routing algorithms which exhibit distributed control typically contain n copies of the algorithm (one at each communication processor). Information is exchanged among
communication processors periodically or asynchronously as
a result of some noticable change in traffic. The information
1107
exchanged varies depending on the metric used by the
algorithm (e.g., the metric might be number of hops, some
estimate of delay to the destination, or buffer lengths). Each
copy of the routing algorithm uses the exchanged (out-ofdate) information in making routing decisions. Such algorithms have the potential for good performance and reliability
because the distributed control can operate in the presence of
failures and quickly adapt to changing traffic patterns. On the
other hand, several new problems arise in such algorithms. If
the algorithm is not careful, then phenomena known as pingponging (message looping) and poor reaction to "bad news''
might occur [134]. These problems were recognized in the
old ARPANET distributed routing algorithm and various
solutions now exist [79]. Rather than continuing the discussion of such algorithms, we simply note that there is a
large degree of similarity between routing messages in the
subnet and scheduling processes on hosts of the DCS. We
consider scheduling in Section III-F and the reader is encouraged to note the similarities.
Another type of distributed routing algorithm is based on
"n" spanning trees being maintained in the network (one at
each host). Each spanning tree identifies all "n" destinations
in the DCS [83]. Each spanning tree is largely independent
of the other trees so this is not a highly cooperative type of
distributed control. Such an approach does 'have a number
of advantages such as guaranteeing that there will be no
looping of messages.
When too many messages are in the subnet, performance
degrades. Such a situation is called congestion. In fact, depending on the subnet protocols it may happen that at high
traffic, performance collapses completely, and almost no
packets are delivered. Solutions include preallocation of
buffers and performing message discarding only when there
is congestion. This is usually done by choke packets [ 134]. A
particularly interesting distributed control algorithm for congestion control is called isarithmic congestion control [134].
In this scheme a set of permits circulates around the subnet
and a set of permits is fixed at each host. Whenever a communication processor wants to transmit a message it must first
acquire a permit, either one assigned to that site (and not
being used) or a circulating permit. When a destination communication processor removes a message from the subnet it
regenerates the permit. Stationary permits are considered
free upon message acknowledgment. This scheme limits the
number of messages in the subnet to some maximum, given
by the number of permits in the system. Can such a scheme
be used for other aspects of DCS's such as scheduling processes on processors, or improving transaction response
time in a distributed database system by avoiding thrashing
situations?
Research on various topics in distributed databases [22],
[62] has been extensive. One of the main research issues is
concurrency control. Various algorithms for concurrency
control have appeared, including some based on distributed
control. One such algorithm is found in the Sirius-Delta
database system developed at INRIA 170]. In this system
integrity of the database is maintained by distributed controllers in the presence of concurrent users. The distributed con-
1108
IEEE TRANSACTIONS ON COMPUTERS, VOL.
trollers must somehow cooperate to achieve a systemwide
objective of good performance subject to the data integrity
constraint. In Sirius-Delta the cooperation is achieved by the
combined principles of atomic actions and unique timestamps. It can then be proven that the multiple distributed
controllers can operate concurrently and still meet the data
integrity constraint. Bernstein and Goodman [22] have also
shown that another class of algorithms based on Two Phase
Locking and atomic actions can also achieve this same cooperation. However, for other functions of distributed operating
systems such as scheduling, there is no requirement for the
data integrity constraint. In fact, as stated before, if such a
constraint were required for scheduling, then in general, performance of the system would suffer dramatically. Removing
the data integrity constraint from the scheduling algorithm
will improve its performance, but the control problem becomes much more difficult.
In the operating system arena, functions such as scheduling, deadlock detection, access control, and file servers are
candidates for being implemented via distributed control.
Consider an individual operating system function to be implemented by n distributed replicated entities (controllers).
For reliability we require that there is no master controller; in
other words each of the entities is considered equal (democratic) at all times. Furthermore, one of the most demanding
requirements is that most operating system functions must
run in real-time with minimum overhead (time sensitive).
This requirement eliminates many potential solutions based
on mathematical or dynamic programming. Central to the
development of distributed control functions is the notion of
what constitutes optimal control. However, such a notion for
dynamic, democratic, decentralized, replicated, adaptive,
stochastic, and time-sensitive functions has not yet been well
formulated. In fact, this is such a demanding set of requirements that there are no mathematical techniques that are
directly applicable.
To better illustrate the complexities involved, we discuss
the complexities in terms of the mathematical discipline of
team theory [54]. At a high level of abstraction team theory
appears to be a good candidate as a mathematical discipline
to base distributed control algorithms on because it contains
all the essential ingredients of the distributed control problem. The main ingredients are as follows.
1) The presence of different but correlated information for
each decision maker about some underlying uncertainty; and
2) the need for coordinated actions on the part of all decision makers in order to realize the systemwide payoff.
More formally, a team-theoretic decision problem contains
five components [54].
1) A vector of random variables 0 = [01, 02,' 0, m]
with given distribution p(0). The random vector represents
all the uncertainties that have a bearing on the problem and
is called the states of nature.
2) A set of observations z = [ZI, Z2,
,Zn] which is
given functions of 0, i.e., zi = 7q(01, 02,
Om)
i1=l1, 2, , n. In general zi is a vector and is the observation available to the ith decision maker.
3) A set of decision variables (or controls)
.
.
.
C-33,
NO.
12,
DECEMBER
1984
, un], one per decision maker. The variables 0, z,
[ul, U2
u are assumed to take values in appropriate spaces 0, Z, U,
respectively.
4) The strategy (decision rule) of the i th decision maker is
a function ui = yi(zi) where yiVi is chosen from some admissible class of functions Fi.
5) The loss (payoff) criterion
LOSS = L(u1, U2,* 'Un, 01, 02,
Finally, the team decision problem is
Find yi E vi,
9,,) -
in order to
min J = E[L(u =
y(-q(O)), 0)].
Note that this model is the simplest form of a team decision
and it yields a complicated optimization problem that, in general, could not be solved in real time on a DCS. The above
model is completely static (one team decision is made) and the
controls (actions) ui of each decision maker do not depend on
the actions of the other decision makers. Cooperation is only
occurring statically through the systemwide objective function and dynamically via changes in the states of nature. For
most DCS's, component 4 must be modified to
=i
q
u)
In other words, the observations of a controller depend on
both the states of nature and the decisions of the other decision
makers. This team problem is unsolved today. In DCS functions, the problem is even more difficult than this unsolved
Team Theory problem because there are additional problems,
e.g., there are inherent delays in the system causing inaccuracies and eliminating the possibility of immediate response
to actions, and decisions must be made quickly. Further, team
theory does not directly deal with stability, an issue that is
fundamental to distributed control.
To have any hope of solving the distributed control problem
we need to either relax our optimization requirement or impose more structure to the problem or both. In effect, the
distributed database problem has imposed additional structure by requiring data integrity. In general, imposing additional structure includes: 1) requiring that each controller act
sequentially and also requiring the controller to know the
action and the result of any action of all previous controllers;
2) various n-step delay approaches [54]; 3) periodic coordination [67]; or 4) using a centralized coordinator. Even with
such simplifications, the specification of additional structure
does not guarantee that the resulting optimization problem is
solvable in practice. Even with this additional structure the
optimization problem can be too complex (and costly) to run
in real time for functions like scheduling and routing. Therefore, it is necessary to develop heuristics that can be run in
real time, and that can effectively coordinate distributed controllers. Even in heuristics, often the delayed effects of the
interactions are not considered. Furthermore, both iterative
solutions and keeping entire histories are not practical for
most DCS functions. For the scheduling problem there is the
added concern that it is difficult if not impossible to know the
direct systemwide effect of a particular action taken by a
STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS
controller. For example, assume that controller i takes action
"a and assume that the net effect of all the actions of all the
controllers improve the system. It cannot be assumed that
action "a" was a good action where, in fact, it may have been
a bad action dominated by the good actions of other controllers. This is an aspect of what is referred to as the "assignment
of credit problem."
In summary, there are many forms of distributed control.
Deciding which type is appropriate for each function in a
DCS is difficult. Deciding how distributed control algorithms of different types will interact with each other under
one system is even more complex. It is our hypothesis that the
advantages of designing the proper algorithms in the right
combinations will be improved performance, reliability, and
extensibility -the major potential benefits of DCS's. Consequently, distributed control is a crucial issue.
D. Reliability
Reliability is a fundamental issue for any system, but
DCS's are particularly well suited to the implementation of
reliability techniques. Reliability is one fundamental issue
where solutions have already been widely used across areas.
However, we believe that recent reliability techniques in the
database area should be better utilized in the operating system
and subnet levels. We begin the discussion on reliability with
a few definitions.
A fault is a mechanical or algorithmic defect which may
generate an error. A fault may be permanent, transient, or
intermittent. An error is an item of information which when
processed by the normal algorithms of the system will produce a failure. A failure is an event at which a system
violates its specifications. Reliability can then be defined as
the degree of tolerance against errors and faults. Increased
reliability comes from fault avoidance and fault tolerance.
Fault avoidance results from conservative design practices
such as high reliability components and conservative design.
Fault tolerance employs error detection and redundancy to
deal with faults and errors. Most of what we discuss in this
section relates to the fault tolerance aspect of reliability.
Reliability is a multidimensional activity that must simultaneously address some or all of the following: fault
confinement, fault detection, fault masking, retries, fault
diagnosis, reconfiguration, recovery, restart, repair, and
reintegration [109]. Rather than simply discussing each of
these in turn, we briefly discuss how these various issues are
treated at the subnet, operating system, programming language, and database levels.
Frames on a subnet typically contain error detection codes
such as the CRC. Conservative design may appear as topology design with "n" paths to each destination. Data link
protocols use handshaking techniques with positive feedback
in the form of ACK and NAK messages and where the NAK
messages may contain reasons for the failure. Timers are
used in the subnet to react to lost messages, lost control
tokens, or network partitionings. Most protocols will employ
retries to quickly overcome transient errors. Some subnets
create an abstraction called a virtual circuit that guarantees
the reliable transmission of messages. Flow control protocols
1109
attempt to avoid congestion, lost messages due to buffer
overruns, and possible deadlock due to too much traffic and
not enough buffer space. Routing algorithms contain multiple routes to each destination or a method of generating a
new route given that failures occur. Alarms and other high
priority messages are used to identify dangerous situations
needing immediate attention. Many of the particular algorithms or protocols used are distributed to avoid single
points of failure. Other typical reliability techniques used
in any number of protocols include backup components,
voting, consistency and range checks, and special testing
procedures.
All the same techniques used in the subnet can also be used
in the DOS [20]. Reliable DOS's should also support replicated files [13], [45], exception handlers, testing procedures
run from remote hosts, and avoid single points of failure
by a combination of replication, backup facilities, and distributed control. Distributed control could be used for file
servers, name servers, scheduling algorithms, and other executive control functions. Process structure, how environment information is kept, the homogeneity of various hosts,
and the scheduling algorithm may allow for relocatability of
processes. Interprocess communication (IPC) might be
supported as a reliable remote procedure call [88], [108].
Reliable IPC would enforce "at least once" or "exactly once"
semantics depending on the type of IPC being invoked. Other
DOS reliability issues relate to invoking processes that
are not active, or attempting to communicate to terminated
processes, or the situation in which a process remains active
but is not used.
ARGUS [73], a distributed programming language, has
explicity incorporated reliability concerns into the programming language. It does this by supporting the idea of an
atomic object, transactions, nested actions, reliable remote
procedure calls, stable variables, guardians (which are modules that survive node failures and synchronize concurrent
access to data), exception handlers, periodic and background
testing procedures, and recovery of a committed update given
the present update does not complete. A distributed program
written in ARGUS may potentially experience deadlock.
Currently, deadlocks are broken by timing out and aborting
actions.
Distributed databases make use of many reliability features
such as stable storage, transactions, nested transactions [86],
commit and recovery protocols [112], nonblocking commit
protocols [34], [110], termination protocols [111], checkpointing, replication, primary/backups, logs/audit trails,
differential files [107], and timeouts to detect failures.
Operating system support is required to make these mechanisms more efficient [50], [131].
From the above list of database reliability features let us
consider termination and recovery protocols. A termination
protocol is used in conjunction with nonblocking commit
protocols and is invoked at failure detection time to guarantee
transaction atomicity. It attempts to terminate (commit or
abort) all affected transactions at all participating hosts without waiting for recovery. This is an extremely important
feature when it is necessary to allow as much continued
1110
execution as possible (availability) in spite of the failure. A
host which has failed must then execute a recovery protocol
before it resumes communication with other hosts. The major
functions of the recovery protocol are to restart the system
processes, and to reestablish consistent transaction states for
all transactions affected by the failure, if this has not been
already accomplished by the termination protocol. This illustrates the close interaction that exists between the various
protocols where decisions made in one protocol make subsequent protocols easier or more difficult. It is obvious that
a termination (cleanup) and recovery protocol are required at
all levels in the system. Hence, specific algorithms (or ideas
from them) used at the database level might be applicable to
other levels as well and vice versa. For example, termination
and recovery protocols may themselves be distributed to enhance reliability. However, the distributed termination protocols typically require n (n - 1) messages during a round of
communication (and several rounds may be necessary) where
n is the number of participating entities. This is too costly for
slow networks, but it may be acceptable on fast local networks or within a single host. The benefits would be greater
reliability and better availability. Note that as was true for
concurrency protocols (Section III-B), further improvements
in efficiency and availability might be possible if these
termination and recovery protocols relax the serializability
requirement.
One aspect of reliability not stressed enough in DCS research is the need for robust solutions, i.e., the solutions
must explicity assume an unreliable network, tolerate host
failures, network partitionings, and lost, duplicate, out of
order, or noisy data [27]. Robust algorithms must sometimes
make decisions after reaching only approximate agreement of
by using statistical properties of the system (assumed known
or dynamically calculated). A related question is at what
level should the robust algorithms, and reliability in general,
be supported? Most systems attempt to have the subnet ensure reliable, error free data transmission between processes.
However, according to the end-to-end argument [104], such
functions placed at the lower levels of the system are often
redundant and unnecessary. The rationale for this argument
is that since the application has to take into account errors
introduced not only by the subnet, many of the error detection
and recovery functions can be correctly and completely provided only at the application level.
The relationship of reliability to the other issues discussed
in this paper is very strong. For example, object oriented
systems confine errors to a large degree, define a consistent
system state to support rollback and restart, and limit propagation of rollback activities. Since objects can represent
unreliable resources (such as processors and disks), and since
higher level objects can be built using lower level objects, the
goal of reliable system design is to create "reliable" objects
out of unreliable objects. For example, a stable storage can
be created out of several disk objects and the proper logic.
Then a physical processor, a checkpointing capability, a
stable storage, and logic can be used to create a stable processor. One can proceed in this fashion to create a very
reliable system. The main drawback is potential loss of
IEEE TRANSACTIONS ON COMPUTERS, VOL.
c-33,
NO.
12,
DECEMBER
1984
execution time efficiency. For many systems it is just
too costly to incorporate an extensive number of reliability
mechanisms. Reliability is also enhanced by proper access
control and judicial use of distributed control, two other
fundamental issues discussed in this paper. The major challenge is to integrate solutions to all these issues in a cost
effective manner and produce an extremely reliable system.
E. Heterogeneity
Incompatibility problems arise in heterogeneous DCS's in
a number of ways [14], [17], [71] and at all levels. First,
incompatibility is due to the different internal formatting
schemes that exist in a collection of different communication
and host processors. Second, incompatibility also arises from
the differences in communication protocols and topology
when networks are connected to other networks via gateways. Third, major incompatibilities arise due to different
operating systems, file servers, and database systems that
might exist on a (set of) network(s).
The easiest solution to this general problem for a single
DCS is to avoid the issue by using a homogeneous collection
of machines and software. If this is not'practical, then some
form of translation is necessary. Some earlier systems left
this translation to the user. This is no longer acceptable.
Translation done by the DCS system can be done at the
receiver host or at the source host. If it is done at the receiver
host then the data traverse the network in their original form.
The data usually are supplemented with extra information to
guide the translation. The problem with this approach is that
at every host there must be a translator to convert each format
in the system to the format used on the receiving host. When
there exist "n" different formats, this requires the support of
(n - 1) translators at each host. Performing the translation at
the source host before transmitting the data is subject to all
the same problems.
There are two better solutions, each applicable under different situations: an intermediate translator, or an intermediate standard data format.
An intermediate translator accepts data from the source
and produces the acceptable format for the destination. This
is usually used when the number of different types of necessary conversions is small. For example, a gateway linking
two different networks acts as an intermediate translator.
For a given conversion problem, if the number of different
types to be dealt with grows large, then a single intermediate
translator becomes unmanageable. In this case, an intermediate standard data format (interface) is declared, hosts
convert to the standard, data are moved in the format of the
standard, and another conversion is performed at the destination. By choosing the standard to be the most common
format in the system, the number of conversions can be
reduced.
At a high level of abstraction the heterogeneity problem
and the necessary translations are well understood. However,
implementing the translators in a cost effective way has not
been achieved in general. Complicated issues are precision
loss, format incompatibilities (e.g., minus zero value in sign
magnitude and 1's complement cannot be represented in 2's
STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS
complement), data type incompatibilities (e.g., mapping of
an upper/lower case terminal to an upper case only terminal
is a loss of information), efficiency concerns, the number and
locations of the translators, and what constitutes a good intermediate data format for a given incompatibility problem.
As DCS's become more integrated one can expect that both
programs and complicated forms of data might be moved to
heterogeneous hosts. How will a program run on this host
given that the host has different word lengths, different machine code, and different operating system primitives? How
will database relations stored as part of a CODASYL database be converted to a relational model and its associated
storage scheme? Moving a data structure object requires
knowledge about the semantics of the structure (e.g., that
some of the fields are pointers and these have to be updated
upon a move). How should this information be imparted to
the translators, what are the limitations if any, and what are
the benefits and costs of having this kind of flexibility? In
general, the problem of providing translation for movement
of data and programs between heterogeneous hosts and networks has not been solved. The main problem is ensuring that
such programs and data are interpreted correctly at the destination host. In fact, the more difficult problems in this area
have been largely ignored.
It is inevitable that incompatibilities will exist in DCS's
because it is quite natural to extend such systems by interconnecting networks, by adding new hosts and communication processors, and by increasing functionality with new
software. Further, the main function of NOS's and file servers is precisely to present a uniform logical interface (view)
to the end user, frormra collection of different environments.
Depending on the degree of incompatibility, the cost of the
translations can be high, thereby limiting their application in
DCS's which have severe real-time constraints. While realtime systems should also be as extensible as possible, they
will probably have to rely on a few translators, on good
object-based design and on extensible distributed control algorithms rather than on being able to incrementally add more
and more translators. For all DCS's, a method is needed to
limit the number of incompatibilities during the lifetime of
the system, while allowing significant and easy extensibility.
The problems associated with heterogeneity are currently
considered less important than the other problems considered
in this paper because they can be handled on a problem by
problem basis. However, as DCS's become more sophisticated and approach achieving their full potential, then we
believe the heterogeneity issue will become increasingly important and a problem by problem solution may not work.
F. Efficiency
Distributed computer systems are meant to be efficient in
a multitude of ways. Resources (files, compilers, debuggers,
and other software products) developed at one host can be
shared by users on other hosts limiting duplicate efforts.
Expensive hardware resources can also be shared minimizing
costs. Communication facilities such as electronic mail and
file transfer protocols also improve efficiency by enabling
better and faster transfer of information. The multiplicity of
lilll
processing elements might also be exploited to improve response time and throughput of user processes. While efficiency concerns exist at every level in the system, they
must also be treated as an integrated "system" level issue. For
example, a good design, the proper tradeoffs between levels,
and the pairing down of over-ambitious features usually improves efficiency. In this section, however, we concentrate
on discussing efficiency as it relates to the execution time of
processes.
Once the system is operational, improving response time
and throughput of user processes is largely the responsibility of scheduling and resource management algorithms
[28], [30], [35], [95], [124], [125], [127]-[129]. The scheduling algorithm is intimately related to the resource allocator
because a process will not be scheduled for the CPU if it is
waiting for a resource. If a DCS is to exploit the multiplicity
of processors and resources in the network it must contain
more than "n" independent local schedulers. The local schedulers must interact and cooperate and the degree to which this
occurs can vary widely. We suggest that a good scheduling
algorithm for a DCS will be a heuristic that acts like ap
"expert system." This expert system's task is to effectively
utilize the resources of the entire distributed system given a
complex and dynamically changing environment. We hope to
illustrate this in the following discussion.
In the remainder of this section when we refer to the scheduling algorithm we are referring to the part of the scheduler
(the expert system) that is responsible for choosing the host
of execution for a process. We assume-that there is another
part of the scheduler which assigns the local CPU to the
highest priority ready process.
We divide the characteristics of a DCS which influence
response time and throughput into two types: 1) system
characteristics, and 2) scheduling algorithm characteristics.
System characteristics include: the number, type, and speed
of processors, the allocation of data and programs [29],
whether data and programs can be moved, the amount and
location of replicated data and programs, how data are partitioned, partitioned functionality in the form of dedicated
processors, any special purpose hardware, characteristics of
the communication subnet, and special problems of distribution such as no central clock and the inherent delays in the
system. A good scheduling algorithm would take the system
characteristics into account. Scheduling algorithm characteristics include: the type and amount of state information used,
how and when that information is transmitted, how that information is used (degree and type of cooperation between
distributed scheduling entities), when the algorithm is invoked, adaptability of the algorithm, and the stability of
the algorithm.
By the type of state information, it is meant whether the
algorithm uses queue lengths, CPU utilization, amount of
free memory, estimated average response time, etc., in making its scheduling decision. The type of information also
refers to whether the information is -local or networkwide
information. For example, a scheduling algorithm on host 1
could use queue lengths of all the hosts in the network in
making its decision. The amount of state information refers
I12
to the number of different types of information used by the
scheduler.
Information used by a scheduler can be transmitted
periodically or asynchronously. If asynchronously, it may be
sent only when requested (as in bidding), it may be piggybacked on other messages between hosts, or it may be sent
only when conditions change by some amount. The information may be broadcast to all hosts, sent to neighbors only, or
to some specific set of hosts.
The information is used to estimate the loads on other hosts
of the network in order to make an informed global scheduling decision. However, the data received are out of date and
even the ordering of events might not be known [64]. It is
necessary to manipulate the data in some way to obtain better
estimates. Several examples are: very old data can be discarded; given that state information is timestamped a linear
estimation of the state extrapolated to the current time might
be feasible; conditional probabilities on the accuracy of the
state information might be calculated in parallel with the
scheduler by some monitor nodes and applied to the received
state information; the estimates can be some function of the
age of the state information; or some form of (iterative) message interchange might be feasible. A message interchange is
subject to long delays before the scheduling decision is made,
and'if mutual agreement among scheduling entities is necessary even in the presence of failures, then the interchange is
also prone to the Byzantine Generals Problem [33], [65]. 'The
Byzantine Generals Problem is a particularly disruptive type
of problem where no assumptions can be made about the type
of failure of a process involved in message exchanges. For
example, the failed process can send messages when it is not
supposed to, can make conflicting claims to other processes,
or act dead for a while and then revive. Even though it is hard
to b'elieve that a process would act like this on purpose, in
practice, systems fail in unexpected ways giving rise to these
kinds of behavior. Protecting against this type of failure is a
conservative approach to reliable systems design.
Before a process is actually moved the cost of moving it
must be accounted for in determining the estimated benefit of
the move. This cost is different if the process has not yet
begun execution than if it is already in progress. In both
cases, the resources required must also be considered. If a
process is in execution, then environment information (e.g.,
the process control blocks) probably should be moved with
the process. It is expected that in many cases the decision will
be not to move the process.
Schedulers invoked too often will produce excessive overhead. If they are not invoked often enough they will not be
able to react fast enough to changing conditions. There will
be undue startup delay for processes. There must be some
ideal invocation schedule which is a function of the load.
In a complicated DCS environment it can be expected that
the scheduler will have to be quite adaptive [12], [24], [123].
A scheduler might make minor adjustments in weighing the
importance of various factors as the network state changes in
an attempt to track a slowly changing environment. Major
changes might require major adjustments. For example, under very light loads there'does not seem to be much justification for networkwide scheduling, so the algorithm might be
turned off-except the part that can recognize a change in
IEEE TRANSACTIONS ON COMPUTERS, VOL.
c-33,
NO.
12,
DECEMBER
1984
the load. At moderate loads, the full blown scheduling algorithm might be employed. This might include individual
hosts refusing all requests for information and refusing to
accept any-process because it is too busy. Under heavy loads
on all hosts it again seems unnecessary to use networkwide
scheduling. A bidding scheme might use both source and
server directed bidding [ 113], [125]. An overloaded host asks
for bids and is the source of work for some other host in the
network. Similarly, a lightly loaded host may make a reverse
bid, i.e., it asks the rest of the network for some work. The
two types of bidding'might coexist. Schedulers could be
designed in a multilevel fashion with decisions being made at
different rates, e.g., local decisions and state information
updates occur frequently, but more global exchanges of decisions and state information might proceed at a slower rate
because of the inherent cost of these global actions.
Stability [25] refers to the situation where processes are
moved among hosts in the network in an incremental and
orderly fashion. It is not acceptable for N - 1 hosts to flood
a lightly loaded host in such a manner that the previously
lightly loaded host must now reassign some of the work
moved to it. (Some form of hysteresis is required.) Scheduling algorithms can employ implicit or explicit stability
mechanisms. An implicit mechanism exists when the algorithm is tuned so that the relative importance of the various
factors used, the relative timings of the scheduling algorithm,
the passing of state information, the characteristics of the
processes in the system, the adaptability of the algorithm,
etc., are all integrated in the right proportion to provide a
stable system. Implicit treatment of stability can be dangerous, but requires less overhead than explicit mechanisms.
Explicit mechanisms refer to specific logic and information
that is used to better guarantee stability. The overheads of
explicit mechanisms can be very high and as one tries to
lower'the overheads, stability becomes jeopardized. It is not
clear which technique is better.
An important part of efficiency is adequate measurement
techniques. Many of the issues raised above require measurement methods. It might be necessary to measure the delay in the subnet, or between two hosts, or the utilization of
a host, or the probabilities of certain conditions in the distributed system. The cost of the measurement must be weighed
against the benefits it produces.
A classic efficiency question in any system is: what should
be supported by the kernel, or more generally by the operating system, and what should be left to the user? The trend in
DCS is to support objects, primitive IPC mechanisms, and
processes in the kernel [115]. Some researchers advocate
supporting the concept of a transaction in the kernel. This
argument will never be settled conclusively since it is a function of the requirements, type of processes running, etc. This
is the classical Vertical Migration question [119].
Of course, many other efficiency questions remain at all
levels including those briefly discussed throughout the previous sections of this paper. These include: the efficiency of
the object model, the end-to-end argument [104], locking
granularity, performance of remote operations, improvements due to distributed control, the cost effectiveness of
various reliability mechanisms, and efficiently dealing with
heterogeneity. Efficiency is, therefore, not a separate issue
STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS
but must be addressed for each issue in order to result in an
efficient, reliable, and extensible DCS. A difficult question
to answer is exactly what is acceptable performance given
that multiple decisions are being made at all levels and that
these decisions are being made in the presence of missing and
inaccurate information.
IV. SUMMARY
While it is true that DCS's have proliferated, it is also true
that there remain many unsolved problems relating to the
issues of the object model, access control, distributed control, reliability, heterogeneity, and efficiency as well as their
interactions. These fundamental issues have been recognized
for some time, but solutions to problems associated with
these issues have not produced totally satisfactory systems.
We will not achieve the full potential advantages of DCS's
until better experimental evidence is obtained and until current and new solutions are better integrated in a systemwide,
flexible, and cost effective manner. Major breakthroughs will
be required to achieve this potential. It is our opinion that
such breakthroughs will be largely based on distributed decision making that will necessarily use heuristics similar to
those found in "expert" systems. The heuristics, though, will
have to directly address the problems of distribution, such as
long delays, the assignment of credit, missing and out-ofdate information, the use of statistical information, and failure events. Further, the heuristics will have to deal with such
complexity very efficiently, and this eliminates many classical solutions. These complications are not typically found
in expert systems to date, so it is not possible to simply
"borrow" the solution. To achieve the objectives of DCS's, it
is important to study distributed resource management paradigms that view resources at an integrated "system" level
which includes the hardware, the communication subnet, the
operating system, the database system, the programming language, and other software resources. To this end, this paper
has tried to take a system viewpoint in presenting six fundamental issues. A DCS of the future will be a form of an
extensible, adaptable, physically distributed, but logically
integrated, expert system.
In summary, this paper has presented a brief overview of
distributed computer systems, and then discussed some of the
problems and solutions for each of six fundamental distributed systems issues. The paper has described (by means of
examples from the communication subnet, distributed operating system, and distributed database areas) the interactions
among these issues, and the need for better integration of
solutions. Important issues that this paper has not covered
due to lack of space are the need for a theory and specification
language for distributed systems, as well as the need for a
distributed systems methodology.
REFERENCES
[1] G. R. Andrews and F. Schneider, "Concepts and notations for concurrent programming," ACM Comput. Surveys, vol. 15, no. 1, Mar. 1983.
[2] P. Green, "An introduction to network architectures and protocols,"
IEEE Trans. Commun., vol. COM-28,-Apr. 1980.
[3] L. Kleinrock and M. Gerla, "Flow control: A comparative survey,"
IEEE Trans. Commun., vol. COM-28, Apr. 1980.
1113
[4] M. Schwartz and T. E. Stem, "Routing techniques used in computer
communication networks," IEEE Trans. Commun., vol. COM-28,
Apr. 1980.
[5] D. Walden and A. McKensie, "The evolution of host to host protocol
technology," IEEE Computer, vol. 12, Sept. 1979.
[6] D. W. Davies and D. L. A. Barber, Communication Networks For
Computers. New York: Wiley, 1973.
[7] D. W. Davies, D. L.A. Barber, W. L. Price, and C. M. Solomonides,
Computer Networks and Their Protocols. New York: Wiley, 1979.
[8] W. R. Franta and I. Chlamtac, Local Networks. Lexington,
MA: Lexington Books, 1981.
[9] J. Martin, Computer Networks and Distributed Processing.
Englewood Cliffs, NJ: Prentice-Hall, 1981.
[10] 1. McNamara, Technical Aspects of Data Communication. Maynard,
MA: Digital, 1977.
[11] C. Weitzman, Distributed Micro/Minicomputer Systems. Englewood
Cliffs, NJ: Prentice-Hall, 1980.
[12] A. K. Agrawala, S. K. Tripathi, and G. Ricart, "Adaptive routing using
a virtual waiting time technique," IEEE Trans. Software Eng.,
vol. SE-8, Jan. 1982.
[13] P. Alsberg and J. Day, "A principle for resilient sharing of distributed
resources," in Proc. 2nd Int. Conf. Software Eng., 1976.
[14] B. Anderson et al., "Data reconfiguration service," Bolt Beranek and
Newman, Tech. Rep., May 1971.
[15] Apollo Domain Architecture, Apollo Computer, Inc., Feb. 1981.
[16] J. M. Ayache, J. P. Courtiat, and M. Diaz, "REBUS, A fault tolerant
distributed system for industrial control," IEEE Trans. Comput.,
vol. C-31, July 1982.
[17] M. Bach, N. Coguen, and M. Kaplan, "The ADAPT system: A generalized approach towards data conversion," in Proc. 5th Int. Conf. Very
Large Data Bases, Rio de Janeiro, Brazil, Oct. 1979.
[18] J. E. Ball, J. Feldman, J. Low, R. Rashid, and P. Rovner, "RIG,
Rochester's intelligent gateway: System overview," IEEE Trans.
Software Eng., vol. SE-2, no. 4, Dec. 1980.
[19] D. K. Barclay, E. R. Byrne, and F. K. Ng, "A real-time database management system for No. 5 ESS," Bell Syst. Tech. J., vol. 61, no. 9,
Nov. 1982.
[20] J. N. Bartlett, "A non-stop operating system," in Proc. 11th Hawaii Int.
Conf. Syst. Sci., Jan. 1978.
[21] P. A. Bernstein, D. W. Shipman, and J. B. Rothnie, Jr., "Concurrency
control in a system for distributed databases (SDD-1)," ACM Trans.
Database Syst., vol. 5, no. 1, pp. 18-25, Mar. 1980.
[22] P. Bernstein and N. Goodman, "Concurrency control in distributed
database systems," ACM Comput. Surveys, vol. 13, no. 2, June 1981.
[23] A. Birrell, R. Levin, R. Needham, and M. Schroeder, "Grapevine:
An exercise in distributed computing," Commun. ACM, vol. 25,
pp. 260-274, Apr. 1982.
[24] S. H. Bokhari, "Dual processor scheduling with dynamic reassignment," IEEE Trans. Software Eng., vol. SE-5, no. 4, July 1979.
[25] R. M. Bryant and R. A. Finkel, "A stable distributed scheduling algorithm," in Proc. 2nd Int. Conf. Distrib. Comput. Syst., Apr. 1981.
[26] L. Casey and N. Shelness, "A domain structure for distributed computer
system," in Proc. 6th ACM Symp. Oper. Syst. Princ., Nov. 1977,
pp. 101-108.
[27] T. C. K. Chou and J. A. Abraham, "Load redistribution under failure in
distributed systems," IEEE Trans. Comput., vol.. C-32, pp. 799-808,
Sept. 1983.
, "Load balancing in distributed systems," IEEE Trans. Software
[28]
Eng., vol. SE-8, no. 4, July 1982.
[29] W. W. Chu, "Optimal file allocation in a multiple computing system,"
IEEE Trans. Comput., vol. C-18, pp. 885-889, Oct. 1969.
[30] W. W. Chu, L. J. Holoway, M. Lan, and K. Efe, "Task allocation in
distributed data processing," IEEE Computer, vol. 13, pp. 57-69,
Nov. 1980.
[31] D. W. Davies, E. Holler, E. D. Jensen, S. R. Kimbleton, B. W. Lampson, G. LeLann, K. J. Thurber, and R. W. Watson, Distributed
Systems -Architecture and Implementation, Vol. 105, Lecture Notes in
Computer Science. Berlin: Springer-Verlag, 1981.
[32] J. Dion, "The Cambridge file server," ACM Oper. Syst. Rev.,
Oct. 1980.
[33] P. Dolev, "The Byzantine generals strike again," J. Algorith., vol. 3,
no. 1, 1982.
[34] C. Dwork and D. Skeen, "The inherent cost of nonblocking commitment," Dep. Comput. Sci., Cornell Univ., Ithaca, NY, Tech. Rep.,
May 1983.
[35] K. Efe, "Heuristic models of task assignment scheduling in distributed
systems," IEEE Computer, vol. 15, June 1982.
[36] P. Enslow, "What is a distributed data processing system," IEEE
Computer, vol. 11, Jan. 1978.
[37] P. Enslow and T. Saponas, "Distributed and decentralized control in a
IEEE TRANSACTIONS ON COMPUTERS, VOL.C-33, NO. 12, DECEMBER 1984
1114
[38]
[39]
[40]
[41]
[42]
[43]
[44]
[45]
[46]
[47]
[48]
[49]
[50]
[51]
[52]
[53]
[54]
[55]
[561
[57]
[58]
[59]
[60]
[61]
[62]
[63]
[64]
[65]
[66]
[67]
[68]
fully distributed processing system," Tech. Rep. GIT-ICS-81/82,
Sept. 1980.
D. J. Farber et al., "The distributed computer system," in Proc. 7th
Annu. IEEE Comput. Soc.Int. Conf., Feb. 1973.
W. D. Farmer and E. E. Newhall, "An experimental distributed switching system to handle bursty computer traffic," in Proc. ACM Symp.
Probl. Opt. Data Commun. Syst., 1969.
R. A. Floyd and C. S. Ellis, "The ROE file system," in Proc. 3rd Symp.
Reliability Distrib. Software Database Syst., Oct. 1983.
H. C. Forsdick, R. E. Schantz, and R. H. Thomas, "Operating systems
for computer networks," IEEE Computer, vol. 11, Jan. 1978.
A. G. Fraser, "Spider An experimental data communications system," Bell Labs., Tech. Rep., 1975.
M. Fridrich and W. Older, "The FELIX file server," in Proc. 8th Symp.
Oper. Syst. Princ. (SIGOPS), Dec. 1981, pp. 37-44.
R. Gallager, "A minimum delay routing algorithm using distributed
computation," IEEE Trans. Commun., vol. COM-25, Jan. 1977.
J. Garcia-Molina, "Reliability issues for fully replicated distributed
databases," IEEE Computer, vol. 16, pp. 34-42, Sept. 1982.
D. Gifford, "Weighted voting for replicated data," in Proc. 7th Symp.
Oper. Syst. Princ., Dec. 1979, pp. 150-159.
, "Violet: An experimental decentralized system," Oper. Syst.
Rev., vol. 13, no. 5, Dec. 1979.
V. D. Gligor and S. H. Shattuck, "On deadlock detection in distributed
systems," IEEE Trans. Software Eng., vol. SE-6, no. 5, pp. 435-440,
Sept. 1980.
J. N. Gray, R. A. Lorie, and G. R. Putzolu, "Granularity of locks in a
shared database," in Proc.Int. Conf. Very Large Database, Sept. 1975,
pp. 428-451.
J. N. Gray, "Notes on data base operating systems," in Operating
Systems: An Advanced Course. Berlin: Springer-Verlag, 1979.
, "The transaction concept: Virtue and limitations," in Proc.Int.
Conf. Very Large Database, Sept. 1981, pp. 144-154.
M. Guillemont, "The chorus distributed operating system: Design and
implementation," in Proc. Int. Symp. Local Comput. -Networks,
Florence, Italy, Apr. 1982.
J. Hamilton, "Functional specification of the WEB kernel," DEC RD
Group, Maynard, MA, Nov. 1978.
Y. -C. Ho, "Team decision theory and information structures," Proc.
IEEE, vol. 68, June 1980.
R. A. Jarvis, "Optimization strategies in adaptive control: A selective
survey," IEEE Trans. Syst., Man, Cybern., vol. SMC-5, Jan. 1975.
E. D. Jensen, "The Honeywell experimental distributed processor- An
overview of its objective, philosophy and architectural facilities," IEEE
Computer, vol. 11, Jan. 1978.
E. D. Jensen and N. Pleszkoch, "ArchOs: A physically dispersed operating system," Distrib. Processing Tech. Comm. Newsletter, Summer
1984.
A. K. Jones, "The object model: A conceptual tool for structuring
software," in Lecture Notes in Computer Science, Vol. 60. Berlin:
Springer-Verlag, 1978.
A. K. Jones, R. J. Chansler,l, Durhan, K. Schwans, and S. R. Vegdahl,
"StarOS, A multiprocessor operating system for the support of task
forces," in Proc. 7th Symp. Oper. Syst. Princ., Dec. 1979.
K. C. Kahn et al., "iMax: A multiprocessor operating system for an
object-based computer," in Proc. 8th Symp. Oper. Syst. Princ.,
Dec. 1981, pp. 14-16.
S. R. Kimbelton, H. M. Wood, and M. L. Fitzgerald, "Network operating systems -An implementation approach," in Proc. AFIPS Conf.,
1978.
W. Kohler, "A survey of techniques for synchronization and recovery in
decentralized computer systems," ACM Comput. Surveys, vol. 13,
no. 2, June 1981.
H. T. Kung and J. T. Robinson, "On optimistic methods for concurrency
control," ACM Trans. Database Syst., vol. 6, no. 2, June 1981.
L. Lamport, "Time, clocks, and the ordering of events in a distributed
system," ACM, July 1978.
L. Lamport, R. Shostak, and M. Pease, "The Byzantine generals problem," ACM Trans. Programming Lang. Syst., vol. 4, no. 3, July 1982.
B. Lampson, "Atomic transactions," in Lecture Notes in Computer
Science, Vol. 105, B. W. Lampson, M. Paul, and H. J. Siegert,
Eds. Berlin: Springer-Verlag, 1980, pp. 365-370.
R. E. Larsen, Tutorial: Distributed Control, IEEE Catalog No. EHO
153-7, New York: IEEE Press, 1979.
E. Lazowska, H. Levy, G. Almes, M. Fischer, R. Fowler, and
S. Vestal, "The Architecture of the Eden System," in Proc. 8th Annu.
Symp. Oper. Syst. Princ., Dec. 1981.
[69] G. LeLann, "Algorithms for distributed data-sharing systems which use
tickets," in Proc. 3rd Berkeley Workshop Distrib. Databases Comput.
Networks, 1978.
[70] ,"A distributed system for real-time transaction processing,"
Computer, vol. 14, Feb. 1981.
P. H. Levine, "Facilitating interprocess communication in a heterogeneous network environment," Masters thesis, Massachusetts Inst.
Technol., Cambridge, MA, June 1977.
[72] B. Lindsay, "Object naming and catalogue management for a distributed
database manager," IBM Res. Rep. RJ2914, Aug. 1980.
[73] B. Liskov and R. Scheifler, "Guardians and actions: Linguistic support
for robust, distributed programs," in Proc. 9th Symp. Princ. Programming Lang., Jan. 1982, pp. 7-19.
M. T. Liu, D. Tsay, C. Chou, and C. Li, "Design of the distributed
double-loop computer network (DDLCN)," J. Digital Syst., vol. V,
no. 12, 1981.
[75] G. W. R. Luderer et al., "A distributed UNIX system based on a virtual
circuit switch," in Proc. 8th Symp. Oper. Syst. Princ., Dec. 1981.
[76] J. R. McGraw and G. R. Andrews, "Access control in parallel programs," IEEE Trans. Software Eng., vol. SE-5, Jan. 1979.
for
M. S. McKendry, J. E. Allchin, and W. C. Thibault, "Architecture
system," in Proc. IEEE INFOCOM, Apr. 1983.
global
[78] J. M. McQuillan and D. C. Walden, "The ARPA network design decisions," Comput. Networks, vol. 1, Aug. 1977.
[79] J. M. McQuillan, I. Richer, and E. C. Rosen, "The new routing
algorithm for the ARPANET," IEEE Trans. Commun., vol. COM-28,
May 1980.
[80] A. Meijer and P. Peeters, Computer Network Architectures.
Rockville, MD: Computer Science Press, 1982.
[81] P.M. Melliar-Smith and R. L. Schwartz, "Formal specification and
IEEE Trans. Comput., vol. C-13,
mechanical verification of
July 1982.
D. A. Menasce and R. R. Muntz, "Locking and deadlock detection in
distributed data bases," IEEE Trans. Software Eng., vol. SE-5, no. 3,
May 1979.
[83] P. M. Merlin and A. Segall, "A failsafe distributed routing protocol,"
IEEE Trans. Commun., vol. COM-27, Sept. 1979.
R. M. Metcalf and D. Boggs, "Ethernet: Distributed packet switching
for local computer networks," Commun. ACM, vol. 19, July 1976.
[85] J. G. Mitchel and J. Dion, "A comparison of two network-based file
servers," Commun. ACM, vol. 25, pp. 233-245, Apr. 1982.
IEEE
[71]
[74]
[77]
operating
SIFT,"
[82]
[84]
[86] J. E. B. Moss, "Nested transactions and reliable distributed computing,"
in Proc. 2nd Symp. Reliability Distrib. Software Database Syst.,
July 1982.
[87] R. M. Needham and A. J. Herbert, The Cambridge Distributed
Computing System. London: Addison-Wesley, 1982.
"Remote procedure call," Xerox Corp., Tech. Rep. CSL[88] B.81-9,J. Nelson,
May 1981.
agent
[89] D. Oppen and Y. K. Dalal, "The clearinghouse: A decentralized
for locating named objects in a distributed environment," Xerox Corp.,
Office Products Div. Rep. OPD-T8103, Oct. 1981.
[90] J. Ousterhout, D. Scelza, and P. Dindhu, "Medusa: An experiment in
distributed operating system structure," Commun. ACM, vol. 23,
Feb. 1980.
can data loops go," IEEE Trans. Commun.,
[91] J. Pierce, "HowJunefar1972.
vol. COM-20,
dis[92] G. Popek et al., "LOCUS, A network transparent, high reliability1981,
tributed system," in Proc. 8th Symp. Oper. Syst. Princ., Dec.
pp. 14-16.
[93] L. Pouzin, "Presentation and major design aspects of the Cyclades
network," in Proc. 3rd ACM Data Commun. Conf.,
computer
Nov. 1973.
M. L. Powell and B. P. Miller, "Process migration in DEMOS/MP," in
[94] Proc.
9th Symp. Oper. Syst. Princ., Oct. 1983.
in
[95] K. Ramamritham and J. A. Stankovic, "Dynamic task scheduling
distributed hard real-time systems," IEEE Software, vol. 1, no. 3,
July 1984.
[96] B. Randell, "Recursively structured distributed computing systemns,"
in Proc. 3rd Symp. Reliability Distrib. Software Database Syst.,
Oct. 1983.
[971 R. Rashid, "An inter-process communication facility for UNIX,"
Carnegie-Mellon Univ., Pittsburgh, PA, Tech. Rep., June 1980.oriented
[98] R. F. Rashid and G. G. Robertson, "Accent: A communication
network operating system kernel," in Proc. 8th Symp. Oper. Syst.
Princ., Dec. 1981.
[99] D. R. Ries and M. R. Stonebraker, "Locking granularity revisited,"
ACM Trans. Database Syst., pp. 210-227, June 1979.
1115
STANKOVIC: DISTRIBUTED COMPUTER SYSTEMS
[100] D. J. Rosenkrantz, R. E. Stearns, and P.M. Lewis, "System level
concurency control for distributed database systems," ACM Trans.
Database Syst., vol. 3, no. 2, June 1978.
[101] J. B. Rothni, Jr., P. A. Bernstein, S. Fox, N. Goodman, M. Hammer,
T. A. Landers, C. Reeve, D. W. Shipman, and E. Wong, "Introduction
to a system for distributed databases (SDD-1)," ACM Trans. Database
Syst., vol. 5, no. 1, pp. 1-17, Mar. 1980.
[1021 L. A. Rowe and K. P. Birman, "A local network based on the UNIX
operating system," IEEE Trans. Software Eng., vol. SE-8, no. 2,
Mar. 1982.
[103] J. H. Saltzer, "Naming and binding of objects," Operating Systems: An
Advanced Course. Berlin: Springer-Verlag, 1978.
[104] J. H. Saltzer, D. P. Reed, and D. D. Clark, "End-to-end arguments in
system design," in Proc. 2nd Int. Conf. Distrib. Comput. Syst.,
Apr. 1981.
[105] N. Sandell, P. Varaiya, M. Athans, and M. Safonov, "Survey of decentralized control methods for large scale systems," IEEE Trans. Auto.
Cont., vol. AC-23, no. 2, Apr. 1978.
[106] A. Segall, "The modelling of adaptive routing in data-communication
networks," IEEE Trans. Commun., vol. COM-25, no. 1, pp. 85-95,
Jan. 1977.
[107] D. G. Severance and G. M. Lohman, "Differential files: Their application to the maintenance of large databases," ACM Trans. Database
Syst., vol. 1, no. 3, Sept. 1976.
[108] S. K. Shrivastava and F. Panzieri, "The design of a reliable remote
procedure call mechanism," IEEE Trans. Comput., vol. C-31,
July 1982.
[109] D. Siewiorek and R. Swarz, The Theory and Practice of Reliable System
Design. Bedford, MA: Digital, 1982.
[110] D. Skeen, "Nonblocking commit protocols," in Proc. ACM SIGMOD,
1981.
[111]
, "A decentralized termination protocol," in Proc. I st IEEE Symp.
Reliability Distrib. Software Database Syst., 1981.
[112] D. Skeen and M. Stonebraker, "A formal model of crash recovery in a
distributed system," IEEE Trans. Software Eng., vol. SE-9, no. 3,
May 1983.
[113] G. R. Smith, "The contract net protocol: High level communication and
control in a distributed problem solver," IEEE Trans. Comput.,
vol. C-29, Dec. 1980.
[114] M. H. Solomon and R. A. Finkel, "The Roscoe distributed operating
system," in Proc. 7th Symp. Oper. Syst. Princ., Mar. 1979.
[115] A. Z. Spector, "Performance remote operations efficiently on a local
computer network," Commun. ACM, vol. 25, pp. 246-259, Apr. 1982.
[116] A. Z. Spector and P.M. Schwarz, "Transactions: A construct for
reliable distributed computing," ACM Oper. Syst. Rev., vol. 17, no. 2,
Apr. 1983.
[117] S. K. Srivastava, "On the treatment of orphans in a distributed system,"
in Proc. 3rd Symp. Reliability Distrib. Syst., Oct. 1983.
[118] W. Stallings, Local Networks. New York: Macmillan, 1984.
[119] J. A. Stankovic, "The types and interactions of vertical migrations of
functions in a multi-level interpretive system," IEEE Trans. Comput.,
vol. C-30, July 1981.
[120]
, "Improving system structure and its affect on vertical migration,"
Microprocessing and Microprogramming, vol. 8, no. 3,4,5,
Dec. 1981.
[121]
, "ADCOS-An adaptive, system-wide, decentralized controlled
operating system," Univ. Massachusetts, Amherst, MA, Tech. Rep.
ECE-CS-81-2, 1981.
[122] -, "Software communication mechanisms: Procedure calls versus
messages," IEEE Computer, vol. 15, Apr. 1982.
[123]
, "Simulations of three adaptive decentralized controlled, job
scheduling algorithms," Comput. Networks, vol. 8, no. 3,
pp. 199-217, June 1984.
[124]
, "Bayesian decision theory and its application to decentralized
control of job scheduling," IEEE Trans. Comput., vol. C-34,
Jan. 1985, to be published.
[125] J. A. Stankovic and 1. S. Sidhu, "An adaptive bidding algorithm for
processes, clusters and distributed groups," in Proc. 4th Int. Conf.
Distrib. Comput., May 1984.
[126] J. A. Stankovic, K. Ramamritham, and W. Kohler, "Current research
and critical issues in distributed system software," Dep. Elec. Comput.
Eng., Univ. Massachusetts, Amherst, MA, Tech. Rep., 1984.
[127] H. S. Stone, "Multiprocessor scheduling with the aid of network flow
algorithms," IEEE Trans. Software Eng., vol. SE-3, Jan. 1977.
[128]
[129]
[130]
[131]
[132]
[133]
[134]
[135]
[136]
[137]
[138]
[139]
[140]
[141]
[142]
[143]
[144]
, "Critical load factors in distributed computer systems," IEEE
Trans. Software Eng., vol. SE-4, May 1978.
H. S. Stone and S. H. Bokhari, "Control of distributed processes," IEEE
Computer, vol. 11, pp. 97-106, July 1978.
M. Stonebraker and E. Neuhold, "A distributed database version of
INGRES," in Proc. 1977 Berkeley Workshop Distrib. Data Management Comput. Networks, pp. 19-36.
M. Stonebraker, "Operating system support for database management,"
Commun. ACM, vol. 24, pp. 412-418, July 1981.
H. Sturgles, J. Mitchell, and J. Isreal, "Issues in the design and use of
distributed file system," ACM Oper. Syst. Rev., July 1980.
D. Swinehart, G. McDaniel, and G. Boggs, "WFS: A simple shared file
system for a distributed environment," in Proc. 7th Symp. Oper. Syst.
Princ., Dec. 1979.
A. S. Tanenbaum, Computer Networks. Englewood Cliffs,
NJ: Prentice-Hall, 1981.
R. R. Tenny and N. R. Sandell, Jr., "Structures for distributed decisionmaking," IEEE Trans. Syst., Man, Cybern., vol. SMC-11,
pp. 517-527, Aug. 1981.
, "Strategies for distributed decision-making," IEEE Trans. Syst.,
Man, Cybern., vol. SMC-11, pp. 527-538, Aug. 1981.
R. H. Thomas, "A majority consensus approach on concurrency control
for multiple copy databases," ACM Trans. Database Syst., vol. 4,
no. 2, pp. 180-209, June 1979.
D. Tsay and M. Liu, "MIKE: A network operating system for the
distributed double-loop computer network," IEEE Trans. Software
Eng., vol. SE-9, no. 2, Mar. 1983.
A. van Dam, and J. Michel, "Experience and distributed processing on
a host/satellite graphics system," in Proc. SIGGRAPH, July 1976.
B.G. Walker, G. Popek, R. English, C. Kline, and G. Theil, "The
LOCUS distributed operating system," in Proc. 9th Symp; Oper. Syst.
Princ., Oct. 1983.
S. Ward, "TRIX a network oriented operating system," in Proc.
COMPCON, 1980.
M.V. Wilkes and R.M. Needham, The Cambridge CAP Computer
and its Operating System. Amsterdam, The Netherlands: Elsevier
North-Holland, 1979.
L. Wittie and A. M. Van Tilborg, "MICROS, A distributed operating
system for micronet, A reconfigurable network computer," IEEE Trans.
Comput., vol. C-29, Dec. 1980.
W. Wulf, E. Cohen, W. Corwin, A. Jones, R. Levin, C. Pierson, and
F. Pollack, "HYDRA: The kernel of a multiprocessor operating system," Commun. ACM, vol. 17, June 1974.
John A. Stankovic (S'77-M'79) received the
Sc.B. degree in electrical engineering in 1970, and
the Sc.M. and Ph.D. degrees in computer science in
1976 and 1979, respectively, all from Brown University, Providence, RI.
He is now an Associate Professor in the Department of Electrical and Computer Engineering,
University of Massachusetts, Amherst, MA. He has
been active in distributed systems research since
1976. His current research includes various ap-
proaches to process scheduling on loosely coupled
networks and recovery protocols for distributed databases. He has been
involved in CARAT, a distributed systems testbed project at the University
of Massachusetts.
Prof. Stankovic was coeditor of the January 1978 Special Issue of IEEE
Computer on Distributed Processing. He now serves as the Vice Chairman of
the IEEE Technical Committee on Distributed Operating Systems. In this
capacity he has been responsible for serving as the Editor of two Special Issues
of the Technical Committee's Newsletter. He received the 1983 Outstanding
Junior Faculty Award for the School of Engineering, University of Massachusetts. He is a member of ACM and Sigma Xi.