Olap Exp05
Olap Exp05
Olap Exp05
3. Theory :
. Data is one the vital entity in the database is managed by two using two major
database approaches known as centralized database management approach and distributed
database management approach. Centralized database management is one the traditional
approach of database management, in which all of the data in database is sited on central
location. Centralized database approach has overcome several limitations of file oriented
approach of prior times. Since in centralized database approach the data is placed on central
repository hence it is easy to access or extract data from multiple tables as compare to
distributed database approach where data is distributed over several sites. In centralized
database the database query can be easily transformed into set of relational algebra’s
operation, but in distributed database system one has to put more effort to analyze the
amount of data exchange in addition to corresponding set of relational algebra’s operations.
Distributed database system is defined as collection of logically interrelated data distributed
over several sites.
The number of nodes in distributed system is connected either by using wired or
wireless network media. One of the major issues in the distributed database design is the
placement of data and program across the number of computer or site available in the
system. After making placement of data and application program one has to focus on
transforming a distributed query into equivalent low level query so that actual
Prof . Apeksha Mohite Department of Information Technology
implementation and execution strategy of the query can be carried out. In distributed
database system, it is obvious the database query will extract data from several different
sites, so in this case the important factor is to reduce to amount of data transmission to
maximum extent.
Before proceeding further let us first understand the concept of Join and Semi joins.
Join is one of the most imperative operations in database theory that is used to
extract information from two or more than two tables. Technically join operation is one of
EMP
ENO ENAME JOB MGR_NO HIRE_D SALARY P_NO D_NO
DEPT
D_NO DNAME LOC
In EMP table we have 14 tuples and we assume each tuple is 100bytes long.
We assume size of each attribute in EMP as below:
ENO ENAME JOB MGR_NO HIRE_D SALARY P_NO D_NO
(5 bytes) (15 bytes) (10 bytes) (6 bytes) (20 bytes) (20 bytes) (20 bytes) (5 bytes)
It is assumed that both relations are not fragmented. So total memory consumed by EMP and
DEPT table is 1400 bytes and 600 bytes respectively. It is further assumed that the following query
is requested at site3.
Query at site3 : Find the name, Dname, Deptno and job of the employee.
Select ENAME,DNAME,JOB ,D_NO FROM EMP,DEPT;
Query Processing Using Joins in Distributed Database System:
The original query that extract data from two tables EMP and DEPT in distributed
database system can be implemented and executed in three different ways as given below
in Case I, II and III.
After join the result produced at site 3 will have 14 rows and attributes
ENAME,DNAME,JOB,D_NO (total 79 bytes)
Case I: In this case to implement and execute the query one has to transfer both join table
EMP and DEPT to the resultant location i.e. at site 3. The following diagram shows the
query plan of above said case.
Site 3
The total number of bytes transferred in this case will be computed as follow:
1. Transmission from S1 to S3 : 14 * 100 = 1400 bytes.
2. Transmission from S2 to S3 : 04 * 150 = 600 bytes.
3. Total transmission to S3 = 1400 + 600 = 2000 bytes.
The following diagram shows the query plan of above said case.
Site 3
The total number of bytes transferred in this case will be computed as follow:
1. Transmission from S1 to S2 : 14 * 100 = 1400 bytes.
2. Transmission from S2 to S3 : 14 * 79 = 1106 bytes.
3. Total transmission to S3 = 1400 + 1106 = 2506 bytes.
Case III: This is just reverse case of Case II; in this case DEPT table will be transmitted at Site 1
where EMP table is already available. Now apply the join operation here and transmit the required
result to site 3. The following diagram shows the query plan of above said case.
Site 3
Execution Plan:
The optimizer selects the best strategy and implements the same. Thus, from the above
analysis we can conclude that the case III provides efficient results with best resource
utilization. Thus, we have seen how joins can be used for processing distributed queries.
Execution Plan: