DDBS Lec2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 16

CSE 607

Distributed Database
Distributed DBMS Architecture

Source:
1. Principles of Distributed Database Systems
By TannerOzsu, Patric Valdureitz
2. Slides available
Architectural models for distributed DBMS
• Consider the possible ways in which multiple databases may be put
together for sharing by multiple DBMSs.
• Fig. 1.10 organizes the systems as characterized with respect to 1) the
autonomy of local systems 2) their distribution and 3) their heterogeneity.

Fig. 1.10: DBMS implementation alternatives


Autonomy
• It refers to the distribution of control, not of data. It indicates the degree to which
individual DBMSs can operate independently.

• It is a function of a number of factors: i) whether the component systems exchange


information ii) Whether they can independently execute transactions and iii)
whether one is allowed to modify them.

Dimensions of autonomy:
1. Design autonomy:
Individual DBMSs are free to use the data models and transaction
management techniques that they prefer.
2. Communication autonomy:
Each of the individual DBMSs is free to make its own decision as to what
type of information it wants to provide to the other DBMSs or to the
software that controls their global execution.
3. Execution autonomy:
Each DBMS can execute the transactions that are submitted to it in any way
that it wants to.
Distribution
The distribution dimension of the taxonomy deals with data.
We consider the physical distribution of data over multiple sites, the user sees
the data as one logical pool.

We consider two classes of DBMSs distribution:


i) client/server distribution and
ii) peer-to-peer distribution (full distribution).

Client/server distribution:
It concentrates data management duties at servers while the clients focus on
providing the application environment including the user interface.

Peer-to-peer distribution:
There is no distinction of client machines versus servers. Each machine has full
DBMS functionality and can communicate with other machines to execute
queries and transactions.
Heterogeneity
Heterogeneity may occur in various forms in distributed systems, ranging
from hardware heterogeneity and differences in networking protocols to
variations in data managers.
The important ones relate to data models, query languages, and transaction
management protocols.

Architectural alternatives
• Consider the architectural alternatives starting at the origin in Fig. 4.3, and
moving along the autonomy dimension.
• We use a notation based on the alternatives along the three dimension.
• The dimensions are identified as A (autonomy), D (distribution), and H
(heterogeneity).
• The alternatives along each dimension are identified by numbers 0, 1 or 2.
• Along the autonomy dimension, 0 represents tight integration, 1 represents
semiautonomous systems and 2 represents total isolation.
• Along the distribution dimension, 0 is for no distribution, 1 is for
client/server systems, and 2 is for peer-to-peer distribution.
• Along the heterogeneous dimension, 0 identifies homogeneous systems
while 1 stands for heterogeneous systems.
Some examples:
(A0, D2, H0)- Peer-to-Peer distributed homogeneous DBMS,
(A0, D1, H0)- Client-Server distribution,
(A2, D2, H1)- Peer-to-Peer distributed heterogeneous multi-database system.

Distributed DBMS Architecture

Client/Server Systems
• Distinguish the functionality that needs to be provided
and divide these functions into two classes: server
functions and client functions.
• This provides a two-level architecture which makes it
easier to mange the complexity of modern DBMSs and
the complexity of distribution.
• The server does most of the data management work, i.e., all of query
processing and optimization, transaction management and storage
management is done at the server.
• The client, in addition to the application and the user interface, has a DBMS
client module that is responsible for managing the data that is cached to the
client and sometimes managing the transaction locks.
• There is OS and communication software that runs on both the client and the
server. The client/server architecture is depicted in Fig. 4.4.
• The client passes SQL queries to the server without trying to understand or
optimize them. The server does most of the work and returns the result relation
to the client.

Fig. 4.4: Client/Server Reference Architecture


There are different types of client/server architecture:
1. multiple client-single server: the database is stored on only one
machine (the server) which also hosts the software to manage it and
2. multiple client-multiple server: Two alternative management
strategies are possible: either each client manages its own connection
to the appropriate server of each client knows of only its “home
server” which then communicates with other servers as required.
• The former approach simplifies server code, but loads the client machines
with additional responsibilities (heavy client systems). The latter approach
concentrates the data management functionality at the servers.
• From the data logical perspective, client/server DBMSs provide the same
view of data as do peer-to-peer systems. They give the user the appearance
of a logically single database, while at the physical level data may be
distributed. Thus the primary distinction between these systems is not in
the level of transparency that is provided to the users and applications, but
in the architectural paradigm that is used to realize this level of
transparency.
Peer-to-Peer Distributed Systems
The physical data organization on each machine is different.
There needs to be an individual internal schema definition at each site (local
internal schema (LIS)),
The enterprise view of the data is described by the global conceptual schema
(GCS)- it describes the logical structure of the data at all the sites.
To handle fragmentation and replication, the logical organization of data at each
site needs to be described.
Therefore, there needs to be a third layer in the architecture, the local conceptual
schema (LCS).
The GCS is the union of the LCSs.
User applications and user access to the database is supported by external
schemas (ESs) (Fig. 4.5).

Fig. 4.5: Distributed Database Reference Architecture


The user queries data irrespective of its location or of which local component of
the distributed database system will service it. The distributed DBMS
translates global queries into a group of local queries, which are executed by
distributed DBMS components at different sites that communicate one
another.
The local database management components are integrated by means of global
DBMS functions. The local conceptual schemas are mappings of the global
schema onto each site.
Fig. 4.6: Functional schematic of an Integrated Distributed DBMS

The detailed components of a distributed DBMS are shown in Fig. 4.7.


One component handles the interaction with users, and another deals with the
storage.
The first component user processor consists of four elements:
i) User interface handler
ii) Semantic data controller
iii) Global query optimizer and decomposer
iv) Distributed execution monitor
• The second component data processor consists of three elements:
i) Local query optimizer
ii) Local recovery manager
iii) Run-time support processor
• In peer-to-peer systems, one expects to find both the user processor
modules and the data processor modules on each machine.

Fig. 1.15: Components of a Distributed DBMS


MDBS architecture
• In the case of logically integrated distributed DBMSs, the global conceptual
schema defines the conceptual view of the entire database, while in the case of
distributed multi DBMSs, it represents only the collection of some of the local
databases that each local DBMS wants to share.
• Thus, the definition of a global database is different in MDBSs than in
distributed DBMSs. In the latter, the global database is equal to the union of
local databases, whereas in the former it is only a subset of the same union.

Models using a Global Conceptual Schema


• In a MDBS, the GCS is defined by integrating either the external schemas of
local autonomous databases or parts of their local conceptual schemas (Fig.
1.16).

Fig. 1.16: MDBS architecture with a GCS

• Users of a local DBMS define their own views on the local database and do not
need to change their applications if they do not want to access data from
another database.
Distributed Database Design
 The design of a distributed computer system involves making decisions on the
placement of data and programs across the sites of a computer network, as
well as possibly designing the network itself.
 The distribution of applications involves two things: the distribution of the
distributed DBMS software and the distribution of the application programs
that run on it.

Design Strategies
Two design strategies are :
1. top-down approach and
2. bottom-up approach.
Top-down design process
The activity begins with a requirements analysis that defines the environment
of the system and elicits both the data and processing needs of all potential
database users.
The observation and monitoring phase is used for constant monitoring and
periodic adjustment and tuning of the design and development activity.

Fig. 3.2 shows a framework for top-down design process.

Bottom-up design process


• Sometimes, a number of databases already exist, and the design task involves
integrating them into one database. The bottom-up approach is suitable for
this.
• The starting point is the individual local conceptual schemas. The process
consists of integrating local schemas into the global conceptual schema.

Next: Distribution design issues, Fragmentation, Allocation


• The requirements study also specifies where the final system is expected to
stand with respect to the objectives of a distributed DBMS.
• The requirements document is input to two parallel activities: view design
(interfaces) and conceptual design (entity types and relationships).
• The GCS and access pattern information collected by view design are inputs
to the distribution design step. This step is used to design the local
conceptual schemas by distributing the entities over the sites.
• Rather than distributing relations, they are divided into sub-relations, called
fragments, which are then distributed.
• The last step in the design process is the physical design, which maps the
local conceptual schemas to the physical storage devices available at the
corresponding sites. The inputs to this process are the local conceptual
schema and access pattern information about the fragments in these.

You might also like