Chapter 4-Naming
Chapter 4-Naming
Chapter 4-Naming
Introduction
names play an important role to:
share resources
uniquely identify entities
refer to locations
etc.
an important issue is that a name can be resolved to the entity
it refers to
to resolve names, it is necessary to implement a naming
system
in a distributed system, the implementation of a naming
system is itself often distributed, unlike in nondistributed
systems
efficiency and scalability of the naming system are the main
issues
2
Objectives of the Chapter
we discuss how
human friendly names are organized and implemented; e.g.,
those for file systems and the WWW
names are used to locate mobile entities
to remove names that are no more used, also called garbage
collection
3
4.1 Naming Entities
4
access point
the name of an access point is called an address (such as
IP address and port number as used by the transport layer)
the address of the access point of an entity is also referred
to as the address of the entity
an entity can have more than one access point (similar to
accessing an individual through different telephone
numbers)
an entity may change its access point in the course of time
(e.g., a mobile computer getting a new IP address as it
moves)
5
an address is a special kind of name
it refers to at most one entity
each entity is referred by at most one address; even when
replicated such as in Web pages
an entity may change an access point, or an access point
may be reassigned to a different entity (like telephone
numbers in offices)
separating the name of an entity and its address makes it
easier and more flexible; such a name is called location
independent
there are also other types of names that uniquely identify an
entity; in any case an identifier is a name with the following
properties
it refers to at most one entity
each entity is referred by at most one identifier
it always refers to the same entity (never reused)
identifiers allow us to unambiguously refer to an entity
6
examples
name of an FTP server (entity)
URL of the FTP server
address of the FTP server
IP number:port number
the address of the FTP server may change
7
4.2 Name Spaces and Name Resolution
names in a distributed system are organized into a name space
a name space is generally organized as a labeled, directed
graph with two types of nodes
leaf node: represents the named entity and stores
information such as its address or the state of that entity
directory node: a special entity that has a number of outgoing
edges, each labeled with a name
10
symbolic link: representing an entity by a leaf node and
instead of storing the address or state of the entity, the
node stores an absolute path name
1. Mounting
as an example, consider a mounted file system, which
can be generalized to other name spaces as well
let a directory node store the directory node from a
different (foreign) name space
the directory node storing the node identifier is called a
mount point
the directory node in the foreign name space is called a
mounting point, normally the root of a name space
during name resolution, the mounting point is looked up
and resolution proceeds by accessing its directory table
12
consider a collection of name spaces distributed across
different machines (each name space implemented by a
different server)
to mount a foreign name space in a DS, the following are at
least required
the name of an access protocol (for communication)
the name of the server
the name of the mounting point
each of these names needs to be resolved
to the implementation of the protocol
to an address where the server can be reached
to a node identifier in the foreign name space
the three names can be listed as a URL
13
example: Sun’s Network File System (NFS) is a distributed file
system with a protocol that describes how a client can access
a file stored on a (remote) NFS file server
an NFS URL may look like nfs://flits.cs.vu.nl/home/steen
- nfs is an implementation of a protocol
- flits.cs.vu.nl is a server name to be resolved using DNS
- /home/steen is resolved by the server
e.g., the subdirectory /remote includes mount points for
foreign name spaces on the client machine
a directory node named /remote/vu is used to store
nfs://flits.cs.vu.nl/home/steen
consider /remote/vu/mbox
this name is resolved by starting at the root directory on
the client’s machine until node /remote/vu, which returns
the URL nfs://flits.cs.vu.nl/home/steen
this leads the client machine to contact flits.cs.vu.nl
using the NFS protocol
then the file mbox is read in the directory /home/steen
14
mounting remote name spaces through a specific process protocol
15
The Implementation of a Name Space
a name space forms the heart of a naming service
a naming service allows users and processes to add,
remove, and lookup names
a naming service is implemented by name servers
for a distributed system on a single LAN, a single server
might suffice; for a large-scale distributed system the
implementation of a name space is distributed over multiple
name servers
Name Space Distribution
in large scale distributed systems, it is necessary to
distribute the name service over multiple name servers,
usually organized hierarchically
a name service can be partitioned into logical layers
the following three layers can be distinguished
16
global layer
formed by highest level nodes (root node and nodes close
to it or its children)
nodes on this layer are characterized by their stability, i.e.,
directory tables are rarely changed
they may represent organizations, groups of
organizations, ..., where names are stored in the name
space
administrational layer
groups of entities that belong to the same organization or
administrational unit, e.g., departments
relatively stable
managerial layer
nodes that may change regularly, e.g., nodes representing
hosts of a LAN, shared files such as libraries or binaries,
…
nodes are managed not only by system administrators, but
also by end users 17
an example partitioning of the DNS name space, including Internet-
accessible files, into three layers 18
some requirements of servers at different layers
performance (responsiveness to lookups), availability (failure
rate), etc.
high availability is critical for the global layer, since name
resolution cannot proceed beyond the failing server; it is also
important at the administrational layer for clients in the same
organization
performance is very important in the lowest layer, since
results of lookups can be cached and used due to the relative
stability of the higher layers
they may be enhanced by client side caching (global and
administrational layers since names do not change often)
and replication; they create implementation problems since
they may introduce inconsistency problems (see Chapter 6)
19
Item Global Administrational Managerial
20
Implementation of Name Resolution
recall that name resolution consists of finding the address
when the name is given
assume that name servers are not replicated and that no
client-side caches are allowed
each client has access to a local name resolver, responsible
for ensuring that the name resolution process is carried out
e.g., assume the path name
root:<nl, vu, cs, ftp, pub, globe, index.txt>
is to be resolved
or using a URL notation, this path name would correspond
to ftp://ftp.cs.vu.nl/pub/globe/index.txt
two ways of implementing name resolution
iterative name resolution
recursive name resolution
21B
Iterative
a name resolver hands over the complete name to the root
name server
the root server will resolve the name as far as it can and
return the result to the client; at the minimum it can resolve
the first level and sends the name of the first level name
server to the client
the client calls the first level name server, then the second, ...,
until it finds the address of the entity
m25
Label
each node has a label, a string with a maximum of 63
characters (case insensitive)
the root label is null
children of a node must have different names (to guarantee
uniqueness)
Domain Name
each node has a domain
name
a full domain name is a
sequence of labels
separated by dots (the last
character is a dot;)
domain names are read
from the node up to the
root
full path names must not
exceed 255 characters
26
4.3 Locating Mobile Entities
the naming services discussed so far are used for naming
entities that have fixed locations
they are not well suited for supporting name-to-address
mappings that change regularly as is the case in mobile
entities
mobility could be within the same domain or to a different
domain
e.g. 1; an ftp server called ftp.cs.vu.nl is moved to a new
machine (but within the same domain)
update only the DNS database of the name server for
cs.vu.nl; lookups are not affected
27
the problems with traditional naming services is that they
maintain a direct mapping between human friendly names and
the addresses of entities
each time a name or an address changes, the mapping should
also change
a better solution is to separate naming from locating entities
by introducing identifiers (since it never changes, each entity
has exactly one identifier, and an identifier is never assigned
to a different entity)
a naming service is used to look up an identifier; it gets a
name as input and returns an identifier as output
the identifier (obtained from a naming service) can be stored
locally since it does not change
locating an entity is handled by a location service; it gets an
identifier as input and returns the current address of the
identified entity as output
28
a) direct, single level mapping between names and addresses
b) two-level mapping using identifiers
29
Location Service
two solutions for LANs: Broadcasting and Multicasting, and
Forwarding Pointers
1. Broadcasting and Multicasting
a computer that wants to access another computer for
which it knows its IP address, broadcasts this address
the owner responds by sending its Ethernet address
used by ARP (Address Resolution Protocol) in the Internet
to find the data link address (MAC address) of a machine
broadcasting is inefficient when the network grows
(wastage of bandwidth and too much interruption to other
machines)
multicasting is better when the network grows - send only
to a restricted group of hosts
30
2. Forwarding Pointers
when an entity moves from A to B, it leaves behind a
reference to its new location
advantage
simple: as soon as the first name is located using
traditional naming service, the chain of forwarding
pointers can be used to find the current address
drawbacks
the chain can be too long - locating becomes expensive
all the intermediary locations in a chain have to maintain
their pointers; vulnerability if links are broken
hence, making sure that chains are short and that
forwarding pointers are robust is an important issue
31
3. Home-Based Approaches
the previous approaches have scalability problems
a home location keeps track of the current location of an
entity; often it is the place where an entity was created
it is a two-tiered approach
an example where it is used in Mobile IP
each mobile host uses a fixed IP address
all communication to that IP address is initially directly
sent to the host’s home agent located on the LAN
corresponding to the network address contained in the
mobile host’s IP address
whenever the mobile host moves to another network, it
requests a temporary address in the new network and
informs the new address to the home agent
when the home agent receives a message for the mobile
host it forwards it to its new address and also informs the
sender the host’s current location for sending other
packets 32
home-based approach: the principle of Mobile IP
problems:
creates communication latency
the host is unreachable if the home does no more exist
(permanently changed); the solution is to register the home at a
traditional name service
33
4.4 Removing Unreferenced Entities
when an entity is no longer referenced (by naming and location
services), it must be removed
facilities of automatically removing unreferenced entities are
called distributed garbage collectors
The Problem of Unreferenced Objects
consider remote objects
an object can be accessed only if there is a remote reference
to it
an object for which there is no remote reference to it must be
removed
but there could be two objects, each storing a reference to
the other, but are not referenced at all; these can be
generalized to two or more objects creating a cycle of
objects referring only to each other
such objects must be detected and removed
34
this can be modeled by a graph, where each node represents
an object
there are special objects, such as system wide services and
users, which need not be referenced themselves, called the
root set
the hollow nodes represent objects that are not directly or
indirectly referenced by objects in the root set; such objects
must be removed
36
the problem of maintaining a proper reference count in the presence of
unreliable communication
37
another problem occurs when copying a remote reference to
another process
P1 passes a reference to P2, but the object is yet unaware of
the new reference; then P1 removes its own reference before
P2 contacts the object; creating a race condition
solution: let P1 first inform the object that it is passing a
reference to P2
39
when a new remote reference is created, half of the partial
weight is assigned to the new proxy
40
when a remote reference is duplicated, half of the partial
weight of the proxy is assigned to the new proxy
42
Naive Tracing for garbage collection in Distributed Systems
in a uniprocessor system mark-and-sweep collectors are
used
they use two phases
mark phase: trace all entities from the root set and mark
them (such as recording the entity in a table)
sweep phase: search those that are not marked (those to
be removed)
drawbacks: to ensure that the reachable graph remains the
same, all executing programs needs to be stopped
temporarily and execution is switched to garbage
collection; a scenario called stop-the-world and is not
desirable for distributed garbage collectors
43