DISTRIBUTED DATABASES Presentation
DISTRIBUTED DATABASES Presentation
DISTRIBUTED DATABASES Presentation
Definitions
• It is a collection of multiple interconnected databases, which are spread physically across various
locations that communicate via a computer network.
• It can simply be defined as a database system that stores data in multiple locations instead of one
location. This means that rather than putting all data on one server or on one computer, data is
placed on multiple servers or in a cluster of computers consisting of individual nodes.
Features of Distributed Databases
1. Databases in the collection are logically interrelated with each other. Often they represent
a single logical database.
2. Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of other sites.
3. The processors in the sites are connected via network. They do not have any multiprocessor
configuration. (A multiprocessor is a set of multiple processors that executes instruction
simultaneously)
4. A distributed database is not a loosely connected file system it is rather a loosely coupled
system in which the individual components are not so thoroughly bound together that a
change in one breaks the other. This means that a failure in one of the sites or systems does
not affect the functioning of the system as a whole because another system can complete
the task.
5. A distributed database incorporates transaction processing but it is not synonymous with a
transaction processing system.
Diagram 1
Memory
Database
Location 2
Communication
Channel
Memory Memory
Database Database
location 3 Location 4
Options for distributing a database
1. Data replication – It is about keeping the same copies at different sites. The
whole database may be reproduced and maintained at all or a few sites, or a
particular table may be reproduced and maintained at all or few of the sites.
2. Horizontal partitioning – It is about partitioning a table by records without
disturbing the structure of the table. For example, if you have a table EMP
which stores data according to a schema EMP(Eno, Ename, Dept,
Dept_location), then horizontal partitioning of EMP on Dept_location is about
breaking employee records according to the department location values and
store different set of employee details at different locations. The data at
different locations will be different but the schema will be the same ie
EMP(Eno, Ename, Dept, Dept_location). Each horizontal fragment must have all
the columns of the original base table.
Options for distributing a database
• Distributed nature of organisational units – Most organizations in the current times are
subdivided into multiple units that are physically distributed over the globe. Each unit requires
its own set of local data. Thus the overall database of the organisataion becomes distributed.
• Need for sharing data – The multiple organizational units often need to communicate with
each other and share their data and resources. This demands common databases or replicated
databases that should be used in a synchronized manner.
• Reliability and availability – In DDBMS, if one system fails or stops working, another system can
complete the task. Say the server or computer in location one fails, another server in any of
the other locations can serve the client request(s).
• Support for both OLTP and OLAP – Online Transaction Processing and Online Analytical
Processing work upon diversified systems which may have common data. OLAP and OLTP are
the two primary data processing systems used in data science. The difference between the two
is that OLAP is used for complex data analysis while OLTP is used for real-time processing of
online transactions at scale.
Distribution transparency