Assignment # 2: Submitted by Submitted To Class Semester Roll No
Assignment # 2: Submitted by Submitted To Class Semester Roll No
Assignment # 2: Submitted by Submitted To Class Semester Roll No
Submitted by
Junaid Ahmad
Submitted to
BSCS (Evening B)
Semester
th
6
Roll No
15090
ANS:
Distributed query optimization requires evaluation of a large number of query trees
each of which produce the required results of a query. This is primarily due to the
presence of large amount of replicated and fragmented data.Hence, the target is to find
an optimal solution instead of the best solution.
The main issues for distributed query optimization are −
A distributed system has a number of database servers in the various sites to perform
the operations pertaining to a query. Following are the approaches for optimal resource
utilization −
Operation Shipping
In operations hipping, the operation is run atheist where the data is stored and
notattheclientsite.Theresultsarethentransferredtotheclientsite.This is appropriate for
operations where the operands are available at the same site. Example: Select and
Project operations.
Data Shipping
In data shipping, the data fragments are transferred to the database server,
where the operations are executed. This is used in operations where the operands are
distributed at different sites. This is also appropriate in systems where the
communication costs are low, and local processors are much slower than the client
server.
Hybrid Shipping
This is a combination of data and operation shipping. Here, data fragments are
transferred to the high-speed processors, where the operation runs. The results are
then sent to the client site.
2. Query Trading
In query trading algorithm for distributed database systems, the controlling/client site
for a distributed query is called the buyer and the sites where the local queries execute
are called sellers. The buyer formulates a number of alternatives for choosing sellers
and for re constructing the global results. The target of the buyer is to achieve the
optimal cost.
The algorithm starts with the buyer as signing sub-queries to the seller sites. The
optimal plan is created from local optimized query plans proposed by the sellers
combined with
the communication cost for reconstructing the final result. Once the global optimal plan
is formulated, the query is executed.
Optimal solution generally involves reduction of solution space so that the cost of
query and data transfer is reduced. This can be achieved through a set of heuristic
rules, just as heuristics in centralized systems.
Following are some of the rules −
Perform selection and projection operations as early as possible. This reduces the data flow
over communication network.
Simplify operations on horizontal fragments by eliminating selection conditions which are not
relevant to a particular site.
ANS: A DDB can be homogeneous or heterogeneous DDB. That means all the
DBs in DDB can of same type with same software, hardware, operating system etc. or
at least one of them may be different. When a user sends query / request,
homogeneous system will be able manage the query easily as there is no difference
among DB. But heterogeneous systems will not be able to work same among all the
DBs.
Multi-database
Federated
In this method users will access different databases using a centralized or global
conceptual schema. That means a common schema is created to manage all the DB
requests – which in turn makes the users to access the DB at a common schema.
Hence even though the data is fragmented or distributed over DB, user will be
accessing the central schema for processing his query. Hence accessing the DB
nearest to him is minimal.
When a heterogeneous DDB is using federal method to process the query, there are lot
of issues that it needs to deal with.
Data Models
DDB will have different databases distributed over the network. These DBs will
have their own data models like relational, documented, network, object oriented,
hierarchical etc. The federated schema should be made compatible with all these types
of data models to handle all the DB requests.
Constraints
Each DB will have its own mechanism of accessing the data and defining the
constraints. For example, in relational model primary-foreign key constraint determines
the relationship between the tables, while in hierarchical model parent-child relationship
exists. This makes the federal schema complex as it has to handle all these types of
constraints.
Query Language
As the DB system varies from location to location, the query language used by it
can also change. One system may use SQL while other may use DB2 or Sybase.
Hence federated schema should develop a common language which is compatible all
the query language as well as should be able to develop a code that can be executed in
different DB systems.
A DBMS must construct and store an internal representation for each view that it
supports. This occurs at view definition time. In systems such as SQL/DS, this internal
representation is a parse tree for the view definition statement. When a view is later
referenced in a query, a view composition operation is performed to combine the view’s
parse tree with the query parse tree. The result is a composite parse tree which only
contains references to real stored tables. In creating the internal representation of a
view, names in the view definition statement are bound to the specific objects they
reference,
namely tables and other views. A view is therefore logically dependent on the continued
existence of all objects that it references. If an object is dropped or substantially
changed (for example, if the columns of a table are rearranged or a view is redefined)
then the views referencing those objects.
Within the last few years, distributed database management systems (DDBMS)
have become a rapidly growing field of investigation and a number of implementations
have been reported [Williams8l,Rothnie80,Stonebraker77]. Among the numerous goals
of a DDBMS, two have been recognized as key objectives: site autonomy and data
distribution transparency.
Site autonomy means that each site can operate on its own data as a stand-
alone, single-site DBMS, and that each site retains local control of its own data, even if
the site participates in the execution of a distributed query. This guarantees better
resiliency to failures of sites and communication lines, since there are no centralized
functions or services, such as a global dictionary or centralized deadlock detector.
Further, each site performs all operations on its own local data, including authorization
checking and, of course, database accesses and updates.
As mentioned in the previous section, views are defined in terms of queries which
may reference local and non-local tables and views. Queries may be imbedded in
programs, which are precompiled by (D)DBMS’s such as System R [Astrahan76] and R*
[Wiliamslll]. The result of pre compilation is a set of access modules, defining the
execution plan for the query, which are stored in the (distributed) database. Pre
compilation also creates dependencies(asdescribedinsection2)for the program on the
tables and views referenced in the program. Thus, if a table is dropped or a view
redefined, the program must be invalidated.
At execution time of a program, the system checks whether the program is valid.
If the program is still valid, the system loads and executes the necessary access
modules. If the program has been invalidated, the system may try to recompile the
program; if recompilation succeeds, the program is executed, otherwise an error is
reported [Ng82]. Since distribution transparency requires that the system have the same
behavior with respect to users as a centralized DBMS, a correct implementation of
views in DDBMS must ensure that views are dependent on the objects they reference,
and that programs referencing views are dependent on 377 those views. These
requirements ensure a consistent usage of views by users.
Q#4) Define Transparency of the distributed database and also write
its factors?
ANS: The definition of and DDBMS defines that the system should make the
distribution transparent to the user. Transparent hides implementation details from the
user. For example, in a centralized DBMS, data independence is a form of transparency
it hides changes in the definition and organization of the data from the user. A DDBMS
may provide a various· levels of transparency. However, they all participate in the same
overall objective: to make the use of the distributed database, equivalent to that of a
centralized database.
We can identify four main types of transparency in a DDBMS:
o Distribution transparency
o Transaction transparency
o Performance transparency;
o DBMS transparency.
Distribution transparency
Distribution transparency allows the user to perceive the database as a single, logical
entity. If add BMS exhibits distribution transparency, then the user does not need to
know the data is fragrances (fragmentation transparency) or the location of data items
(Local transparency).
Distribution transparency can be classified into:
o Fragmentation transparency
o Location transparency
o Replication transparency
o Local Mapping transparency
o Naming transparency
Fragmentation transparency
Fragmentation is the highest level of distribution transparency. If fragmentation
transparency is provided by the DDBMS, then the user does not need to know that the
data is fragmented, As a result database accesses are based on the global schema. So
the user does not need to specify fragment names or data locations.
Location transparency
Location is the middle level of distribution transparency. With location transparency, the
user must know how the data has been fragmented but still does not have to know the
location of the data.
Replication transparency
Closely related to location transparency is replication transparency, which means that
the user is unaware of the replication of fragments. Replication transparency is implied'
by location transparency.
Local mapping transparency
This is the lowest level of distribution transparency. With local mapping transparency,
user needs to specify both fragment names and the location of data items, taking into
consideration any replication that may exists.
Clearly, this is a more complex and time-consuming query for the user to enter than the
first. It is unlikely that a system that provides only this level of transparency would be
acceptable to end-users.
Naming transparency
As a corollary to the above distribution transparencies, we have naming transparency.
As in a centralized database, each item in a distributed database must. have a unique
name. Therefore, the DDBMS must ensure that no two sites create a database object
with the same name. One solution to this problem is to create a central name server,
which has the responsibility for ensure uniqueness of all names in the system. However,
this approach results in:
o Loss of some local autonomy;
o Perfoffi1ance problems, if the central site becomes abottleneck;
o Low availability; .if the central site fails the remaining sites cannot create
any .new database objects.
An alternatively solution is to prefix an object , with the identifier of the site that createdit
Forexample,therelationbranchcreatedatsiteS1mightbenamedS1.Branch.Similarly, we
need to be able to identify each fragment and each of its copies. ·Thus, copy 2 of
fragment 3 of the Branch relation created at site 81 might be referred to as
SI.Branch.F3.C2. However, this results in loss of distribution transparency.
An approach that resolves the problems with both these solution uses
aliases(sometimes called synonyms) for each database object. Thus, S I.Brauch.F3 .C2
might be known as Local Branch by the user at site 51. The DDBMS has the task of
mapping an alias to the appropriate database object.
Transaction Transparency
Transaction transparency in a DDBMS environment ensures that all distributed
transactions maintain the· distributed database's integrity and consistency. A distributed
transaction accesses data stored at· mote than one location. Each transaction is divided
into a number of sub transactions one for each site that has to be accessed; a sub
transaction is represented by an agent.
The DDBMS must also ensure the atomicity of each sub transaction. Transaction
transparency in a distributed DBMS is complicated by the fragmentation, allocation and
replication schemas.
Q#5) Explain the Semantic Integrity Control? How can you differ b/w
Centralized Semantic Integrity Control and Distributed Semantic
Integrity Control?
ANS:
Semantic Integrity Control
Predefined constraints are the based on the simple keywords. Through them ,it
is possible to express concisely the more common constraints of the relational
model, such as non-null attribute, unique key, foreign key or functional
dependency.
Employee number in the relation EMP cannot be null.
ENO NOT NULL IN EMP
The project number PNO in relational ASG is a foreign key matching the primary
key PNO of the relation PROJ.
PNO IN ASG REFRENCES PNO IN PROJ
Pre-conditional Constraints express conditions that must be satisfy by all
tuples in a relation for a given update type. The update type, which might be
INSERT, DELETE or MODIFY, permits restriction the integrity control.
The budget of a project is between 500k and1000k.
CHECK ON PROJ (BUDGET+>=500000 AND BUDGET <=10000000)
Distributed Semantic Integrity Control
Definition
Assertion can involve date stored at different sites, the storage of the constraints
must be decided so as to minimize the cost of integrity checking. There is strategy
based on the taxonomy of the integrity constraints the distinguishes three classes:
Individual constraints
Single-relation single-variable constraints. They refer only to the tuple to be
updated independently of the rest of the database.
Set-Oriented constraints
Include single-relation multivariable constraints such as functional dependency
and multi-relation multivariable constraints such as foreign key constraints.