Chapter 3 Naming and Threads

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Process and Threads

Naming
in
Distributed Systems

Chapter 3
Outline
• Process and Threads in DS
• Client and Server
• Naming Basics
• Name Resolution
• Flat Naming vs. Hierarchical Naming:
• Name Services
• Name Spaces
Threads and Process in Distributed Systems
• In distributed systems, processes and threads are fundamental
concepts that play a crucial role in enabling concurrent execution,
efficient resource management, and overall system performance.
• They provide the means to structure and manage the execution of
tasks and programs in a distributed environment.
• Here's an introduction to processes and threads in distributed
systems:
Cont’d…
• Processes:
• A process is an independent and self-contained program execution
unit within an operating system.
• Each process operates in its own isolated memory space and has its
own system resources, making it independent of other processes.
• Key characteristics of processes in distributed systems include:
• Isolation: Processes are isolated from one another, meaning that the
failure or misbehavior of one process does not affect others. This
isolation enhances system security and stability.
Cont’d…
• Resource Management: Each process has its own dedicated
resources, such as CPU time, memory, and file handles. This allows for
efficient resource management and prevents resource contention.
• Security: Processes can be individually protected and managed.
Access control mechanisms can be applied to restrict unauthorized
access and interactions between processes.
• Parallelism: Processes enable parallel execution on multi-core
processors, which is essential for leveraging the full potential of
modern hardware.
Cont’d…
• Processes are typically created using system calls provided by the
operating system. In distributed systems, processes can communicate
with each other through inter-process communication (IPC)
mechanisms like pipes, sockets, shared memory, or message queues.
• Threads:
• A thread is a lightweight unit of execution within a process.
• Threads share the same memory space and resources within a
process and are sometimes referred to as "lightweight processes."
Key characteristics of threads in distributed systems include:
Cont’d…
• Concurrency: Threads enable concurrent execution of multiple tasks
within a single process, taking advantage of multi-core processors.
This concurrency improves system responsiveness and throughput.
• Efficiency: Threads have lower overhead compared to processes
because they share resources and memory. This makes them more
efficient for tasks that require frequent context switching or
parallelism.
• Shared State: Threads within the same process can easily share data
and communicate with each other, making them suitable for
collaborative and closely related tasks.
Cont’d…
• In distributed systems, processes and threads are essential for various
reasons, including:
• Handling concurrent requests: Processes and threads can handle multiple
incoming requests simultaneously, improving system responsiveness and
throughput.
• Resource utilization: By distributing tasks across processes or threads,
distributed systems can utilize system resources efficiently, reducing idle
time.
• Isolation and security: Processes provide strong isolation, while threads
can be useful for sharing data and resources within a controlled context.
• Parallelism and scalability: Processes and threads enable distributed
systems to take advantage of multi-core processors, improving
performance and scalability.
Clients and servers
• Clients and servers in distributed systems use processes and threads
to handle various tasks and provide services efficiently.
• Here's how clients and servers employ processes and threads to
achieve their respective roles:
• Servers:
• Process Per Connection Model:
• Servers often use a "process per connection" model, where a new process is
created for each incoming client connection.
• This model provides strong isolation, ensuring that client interactions do not
interfere with each other.
• Each process can be dedicated to handling a single client request or session,
allowing the server to maintain multiple concurrent connections.
Cont’d…
• Thread Per Connection Model:
• An alternative to the process per connection model is the "thread per
connection" model.
• In this approach, a new thread is created for each incoming client
connection within a server process.
• Threads share the same process address space but have their own
execution context.
• This model is more lightweight than creating separate processes and
is suitable for handling a large number of simultaneous connections.
Cont’d…
• Worker Threads or Processes:
• Servers often employ a pool of worker processes or threads to handle
incoming client requests. When a new client connection is established, the
server assigns the task to an available worker thread or process. This
approach balances the load and efficiently manages resources.
• Task Parallelism:
• Within a server, multiple threads or processes may be responsible for
different tasks, such as listening for incoming connections, processing client
requests, and handling database queries. This division of labor improves the
server's overall performance and responsiveness.
Cont’d…
• Shared State Management:
• When multiple threads or processes are used within a server,
mechanisms for managing shared state and data are critical.
• This may involve using synchronization techniques, such as locks or
semaphores, to prevent data corruption and ensure thread-safe
access to resources.
Client
• Concurrent Requests:
• Clients may use threads to issue multiple concurrent requests to remote
servers.
• Each thread can send a request, receive a response, and manage its own
session with the server.
• This concurrency allows clients to efficiently use the network and reduce
latency.
• Asynchronous Operations:
• In some cases, clients use asynchronous programming models, such as
asynchronous I/O or event-driven architectures, to issue requests to servers.
These models allow clients to continue executing other tasks while waiting for
responses from the server, improving overall client responsiveness.
Cont’d…
• Parallelism in Distributed Computing:
• In distributed computing scenarios, clients may employ threads to parallelize
data processing tasks or computation across multiple servers.
• This parallelism accelerates data analysis, simulations, or other computational
workloads.
• Load Balancing:
• Clients can use multiple threads or processes to interact with multiple server
instances or replicas, distributing the load and ensuring redundancy.
• Load balancing mechanisms help optimize resource utilization and enhance
system reliability.
Cont’d…
• Resource Management:
• Clients may create and manage threads to handle various tasks, such
as user interfaces, data processing, and communication with servers.
• Effective resource management is essential to ensure that client
applications remain responsive and efficient.
• Clients and servers collaborate to provide distributed services
effectively by using processes and threads for concurrent execution,
load distribution, and resource management.
NEXT…

Naming in Distributed Systems


Introduction to Naming
• In distributed systems, naming is the process of assigning names or
labels to various entities, such as servers, resources, processes, and
services.
• Names are used to uniquely identify and reference these entities
within the system.
• In distributed systems, various entities are named to uniquely identify
and reference them within the system.
• Naming entities serve as a fundamental component in facilitating
communication, resource location, and coordination within
distributed environments.
Cont’d…
• Here are common naming entities in distributed systems:
• Hosts and Servers:
• Hosts and servers in a network are often given names to identify them. For example,
a web server may be named "webserver1."
• Services:
• Services or software components may be given names to make them accessible and
to locate them.
• This is common in microservices architectures, where each service has a unique
name for discovery and communication.
• Processes and Threads:
• Processes or threads running on a host can be named for identification and
coordination.
• This is particularly important in distributed computing when coordinating tasks
among different processes.
Cont’d…
• Distributed Objects:
• In distributed object-oriented systems, objects are named to enable remote method
invocation. Each object has a unique name or reference that clients can use to
interact with it.
• Resources:
• Various types of resources, such as printers, storage devices, or sensors, are often
named in a distributed environment. Naming allows users or applications to discover
and access these resources.
• Data and Databases:
• Databases and the data within them can be named for querying and
retrieval. Names are used to identify tables, records, or specific data items.
Name Resolution:
• Name resolution is the process of mapping a name to the
corresponding network address or location.
• When components in a distributed system need to communicate with
each other, they use names to initiate connections.
• Name resolution is essential to determine where the named entity
can be found in the network.
• Name resolution is the process of mapping a human-readable name,
such as a hostname or service name, to a network address or location
that can be used for communication in a distributed system.
Cont’d…
• Name resolution is an essential part of networking and distributed
systems, and it ensures that entities can locate and communicate
with each other by using names instead of numerical addresses.
• Here's how name resolution works in typical distributed systems:
• Name Request:
• When a component in a distributed system wants to communicate with
another component, it uses a human-readable name (e.g., a domain name, a
service name, or a hostname) to identify the target entity.
Cont’d…
• Local Cache Lookup:
• To expedite the name resolution process, the requesting component
may first check its local cache, which stores previously resolved name-
to-address mappings.
• If the name is found in the cache and is still valid, the component can
use the cached address for communication, avoiding the need for
external resolution.
Cont’d…
• Local Configuration:
• If the name resolution request is not found in the local cache, the component
may check its local configuration files or settings. Some systems allow
administrators to configure custom mappings of names to addresses locally.
• Name Service Query:
• If the name resolution request cannot be resolved locally, the component
contacts a name service, which is responsible for maintaining a central or
distributed database of name-to-address mappings.
• Common examples of name services include DNS (Domain Name System) for
internet domain names or service discovery systems in microservices
architectures.
Cont’d…
• Caching in Name Services:
• Name services often have their own caching mechanisms to store resolved
name-to-address mappings temporarily. This helps reduce the load on the
name service and speeds up future name resolution requests for the same
names.
• Iterative Resolution (DNS):
• In the case of DNS, an iterative query involves multiple DNS servers. The client
first contacts a local DNS resolver, which may not have the answer but can
refer the client to authoritative DNS servers. The client follows the referrals,
querying various DNS servers until it reaches the authoritative server for the
domain in question.
Name Services:
• Name services are components or systems responsible for
maintaining a mapping of names to network addresses.
• These services act as a central repository for name-to-address
mappings.
• For example, the Domain Name System (DNS) serves as a global name
service for the internet, mapping domain names to IP addresses.
• Various systems and protocols are used for implementing name
services in distributed environments.
• These systems are responsible for managing the mapping of human-
readable names to network addresses or locations.
Cont’d…
• Here are some of the common systems and protocols used for name
services:
• Domain Name System (DNS):
• DNS is one of the most well-known and widely used name services. It's the
naming system for the internet, translating domain names (e.g.,
www.example.com) into IP addresses.
• DNS is hierarchical, with a distributed architecture that includes authoritative
name servers, caching resolvers, and recursive resolvers.
• Lightweight Directory Access Protocol (LDAP):
• LDAP is a directory service protocol that provides name services for directory-
based systems. It's commonly used in enterprise environments for storing and
retrieving information about users, devices, and resources.
Cont’d…
• Service Discovery Systems
• Service discovery systems are designed for microservices architectures,
allowing services to register and discover one another.
• They maintain a database of service names and their locations, providing load
balancing and failover capabilities. Examples include Consul, etcd, and Apache
ZooKeeper.
• Distributed Hash Tables (DHTs):
• DHTs are data structures used for decentralized and distributed name
services. They allow distributed entities to map keys (names) to values (data)
in a decentralized manner. DHTs are commonly used in peer-to-peer systems,
and Chord is an example of a DHT-based name service.
Name Spaces:
• Distributed systems often define separate name spaces for different
types of entities.
• For example, in a distributed file system, there may be distinct name
spaces for files and directories.
• This separation helps manage and access different types of entities.
• A namespace, in the context of distributed systems and computing, is
a logical or virtual container that provides a way to organize and
manage the naming of entities, resources, or identifiers within a
system
Cont’d…
• Namespaces are designed to prevent naming conflicts and improve
organization by partitioning the global name space into smaller, more
manageable units. Here's how namespaces work:
• Organizing Entities:
• Namespaces are used to organize and group entities, resources, or identifiers. Each
namespace represents a distinct, isolated container for naming. This organization is
especially useful in large and complex distributed systems where a variety of entities
need to be uniquely identified.
• Hierarchical Structure:
• Namespaces are often structured hierarchically, forming a tree-like or
parent-child relationship among namespaces. This hierarchy allows for the
nesting of namespaces within one another, creating a logical structure.
Cont’d…
• Namespace Resolution:
• When a distributed system receives a request or reference that includes a
name, the system uses the concept of namespaces to determine which
namespace the name belongs to. Namespace resolution helps identify the
appropriate container for the name, allowing the system to access the entity
within the correct context.
• Eliminating Naming Conflicts:
• Namespaces effectively eliminate naming conflicts by providing a context or
scope for each set of names. Entities with identical names in different
namespaces do not conflict with each other, as they are considered distinct
within their respective containers.
Cont’d…
• Domain Name System (DNS):
• The DNS is a classic example of a namespace. In the DNS, domain
names are organized hierarchically, with top-level domains like .com,
.org, and .net, and subdomains like example.com or
subdomain.example.com. Each domain or subdomain represents a
namespace. For example:
• "example.com" and "another.com" can coexist with the same subdomain
name, such as "www," without conflicts, as they belong to different
namespaces.
Cont’d…
• File Systems:
• File systems use namespaces to organize files and directories. In this
case, each directory is a namespace, and file names must be unique
within the context of their parent directory. For example:
• In a file system, you can have two directories with files named
"document.txt." Each directory serves as a separate namespace, ensuring that
"document.txt" is unique within its parent directory
Types of Naming
• In distributed systems, various types of naming schemes and approaches
are used to uniquely identify and address entities, resources, or services.
• These naming methods are designed to meet the specific needs of different
distributed systems.
• Flat Naming:
• Explanation: Flat naming, as the name suggests, uses a single, global name
space where all entities have unique names. There are no hierarchical or
structured elements in the names. Each entity is identified by a distinct
name.
• Use Cases: Flat naming is suitable when the number of entities is limited
and the naming scheme is simple. It is often used in small-scale, non-
hierarchical systems.
Hierarchical(Structured) Naming:
• Explanation: Hierarchical naming organizes names in a tree-like
structure, where each name consists of components that form a path
from the root to the entity.
• This hierarchy allows for categorization and organization of entities,
making it easier to manage and locate them.
• Use Cases: Hierarchical naming is useful in large-scale distributed
systems where entities can be grouped and organized logically.
• Examples include the Domain Name System (DNS) for internet
domains or file systems with directories and files.
Cont’d…
Semantic(Attribute-Based) Naming:
• Explanation: Semantic naming uses names that convey meaning or
attributes about the entity or resource they represent.
• Names are chosen based on their semantic relevance, making them
more human-readable and self-explanatory.
• This approach aims to make names self-descriptive and meaningful,
which can be more user-friendly and intuitive.
• Use Cases: Semantic naming is often used in contexts where the
naming scheme needs to be user-friendly or self-documenting.
• For example, in a document management system, files may be named
based on their content or purpose.
Cont’d…
• Example: In a distributed system for managing scientific papers,
attribute-based naming may involve naming papers based on their
attributes, such as the title, author, and publication year. For example,
a paper might be named "Astronomy_Guide_to_Stars_2023.pdf." In
this name:
• "Astronomy" represents the subject or category of the paper.
• "Guide_to_Stars" is the title of the paper.
• "2023" is the publication year.
The End!!!

Thankyou for your Attention!!!

You might also like