Database Notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 33

Database management system:

A database management system is the software system that allows users to define, create and

maintain a database and provides controlled access to the data. A Database Management System

(DBMS) is basically a collection of programs that enables users to store, modify, and extract

information from a database as per the requirements. DBMS is an intermediate layer between

programs and the data. Programs access the DBMS, which then accesses the data. There are

different types of DBMS ranging from small systems that run on personal computers to huge

systems that run on mainframes. The following are main examples of database applications: •

Computerized library systems • Automated teller machines • Flight reservation systems •

Computerized parts inventory systems A database management system is a piece of software that

provides services for accessing a database, while maintaining all the required features of the data.

Commercially available Database management systems in the market are dbase, FoxPro, IMS and

Oracle, MySQL, SQL Servers and DB2 etc. These systems allow users to create update, and extract

information from their databases.Compared to a manual filing system, the biggest advantages to a

computerized database system are speed, accuracy, and’ accessibility.

Advantages of DBMS:
a)Controls Redundancy: In file system, each application has its own private files, which cannot

be shared between multiple applications. 1:his can often lead to considerable redundancy in the

stored data, which results in wastage of storage space. By having centralized database most of this

can be avoided. It is not possible that all redundancy should be eliminated. Sometimes there are

sound business and technical reasons for maintaining multiple copies of the same data. In a

database system, however this redundancy can be controlled.


b) Integrity can be enforced: Integrity of data means that data in database is always accurate,

such that incorrect information cannot be stored in database. In order to maintain the integrity of

data, some integrity constraints are enforced on the database. A DBMS should provide capabilities

for defining and enforcing the constraints.

c) Inconsistency can be avoided : When the same data is duplicated and changes are made at one

site, which is not propagated to the other site, it gives rise to inconsistency and the two entries

regarding the same data will not agree. At such times the data is said to be inconsistent. So, if the

redundancy is removed chances of having inconsistent data is also removed.

d) Data can be shared: As explained earlier, the data about Name, Class, Father __name etc. of

General_Office is shared by multiple applications in centralized DBMS as compared to file system

so now applications can be developed to operate against the same stored data. The applications

may be developed without having to create any new stored files.

e) Standards can be enforced : Since DBMS is a central system, so standard can be enforced

easily may be at Company level, Department level, National level or International level. The

standardized data is very helpful during migration or interchanging of data. The file system is an

independent system so standard cannot be easily enforced on multiple independent applications.

f) Restricts unauthorized access: When multiple users share a database, it is likely that some

users will not be authorized to access all information in the database. For example, account office

data is often considered confidential, and hence only authorized persons are allowed to access such

data. In addition, some users may be permitted only to retrieve data, whereas other are allowed

both to retrieve and to update. Hence, the type of access operation retrieval or update must also be

controlled. Typically, users or user groups are given account numbers protected by passwords,
which they can use to gain access to the database. A DBMS should provide a security and
authorization subsystem, which the DBA uses to create accounts and to specify account

restrictions. The DBMS should then enforce these restrictions automatically.

g)Solves Enterprise Requirement than Individual Requirement: Since many types of users

with varying level of technical knowledge use a database, a DBMS should provide a variety of

user interface. The overall requirements of the enterprise are more important than the individual

user requirements. So, the DBA can structure the database system to provide an overall service

that is “best for the enterprise”.

h) Provides Backup and Recovery: A DBMS must provide facilities for recovering from

hardware or software failures. The backup and recovery subsystem of the DBMS is responsible

for recovery. For example, if the computer system fails in the middle of a complex update program,

the recovery subsystem is responsible for making sure that the .database is restored to the state it

was in before the program started executing.

i)Cost of developing and maintaining system is lower: It is much easier to respond to

unanticipated requests when data is centralized in a database than when it is stored in a

conventional file system. Although the initial cost of setting up of a database can be large, but the

cost of developing and maintaining application programs to be far lower than for similar service

using conventional systems. The productivity of programmers can be higher in using non-

procedural languages that have been developed with DBMS than using procedural languages.

j) Data Model can be developed : The centralized system is able to represent the complex data

and interfile relationships, which results better data modeling properties. The data madding

properties of relational model is based on Entity and their Relationship, which is discussed in detail

in chapter 4 of the book.


k)Concurrency Control : DBMS systems provide mechanisms to provide concurrent access of

data to multiple users.

Disadvantages of DBMS
a) Complexity : The provision of the functionality that is expected of a good DBMS makes the

DBMS an extremely complex piece of software. Database designers, developers, database

administrators and end-users must understand this functionality to take full advantage of it. Failure

to understand the system can lead to bad design decisions, which can have serious consequences

for an organization.

b) Size : The complexity and breadth of functionality makes the DBMS an extremely large piece

of software, occupying many megabytes of disk space and requiring substantial amounts of

memory to run efficiently.

c) Performance: Typically, a File Based system is written for a specific application, such as

invoicing. As result, performance is generally very good. However, the DBMS is written to be

more general, to cater for many applications rather than just one. The effect is that some

applications may not run as fast as they used to.

d)Higher impact of a failure: The centralization of resources increases the vulnerability of the

system. Since all users and applications rely on the ~vailabi1ity of the DBMS, the failure of any

component can bring operations to a halt.

e) Cost of DBMS: The cost of DBMS varies significantly, depending on the environment and

functionality provided. There is also the recurrent annual maintenance cost.


f) Additional Hardware costs: The disk storage requirements for the DBMS and the database

may necessitate the purchase of additional storage space. Furthermore, to achieve the required

performance it may be necessary to purchase a larger machine, perhaps even a machine dedicated

to running the DBMS. The procurement of additional hardware results in further expenditure.

g) Cost of Conversion: In some situations, the cost oftlle DBMS and extra hardware may be

insignificant compared with the cost of converting existing applications to run on the new DBMS

and hardware. This cost also includes the cost of training staff to use these new systems and

possibly the employment of specialist staff to help with conversion and running of the system.

This cost is one of the main reasons why some organizations feel tied to their current systems and

cannot switch to modern database technology.

Type of Database System:

On the basis of the number of users:

• Single-user DBMS

• Multi-user DBMS

On the basis of the site location

• Centralized DBMS

• Parallel DBMS

• Distributed DBMS

• Client/server DBMS
The database system may be multi-user or single-user. The configuration of the hardware and the

size of the organization will determine whether it is a multi-user system or a single user system.

In single user system the database resides on one computer and is only accessed by one user at a

time. This one user may design, maintain, and write database programs. Due to large amount of

data management most systems are multi-user. In this situation the data are both integrated and

shared. A database is integrated when the same information is not recorded in two places. For

example, both the Library department and the Account department of the college database may

need student addresses. Even though both departments may access different portions of the
database, the students’ addresses should only reside in one place. It is the job of the DBA to make

sure that the DBMS makes the correct addresses available from one central storage area.

On the basis of site location


a)Centralized Database System:

The centralized database system consists of a single processor together with its associated data

storage devices and other peripherals. It is physically confined to a single location. Data can be

accessed from the multiple sites with the use of a computer network while the database is

maintained at the central site.

Disadvantages of Centralized Database System

• When the central site computer or database system goes down, then every one (users) is blocked

from using the system until the system comes back. • Communication costs from the terminals to

the central site can be expensive.

b)Parallel Database System:


Parallel database system architecture consists of a multiple Central Processing Units (CPUs) and

data storage disk in parallel. Hence, they improve processing and Input/Output (I/O) speeds.

Parallel database systems are used in the application that have to query extremely large databases

or that have to process an extremely large number of transactions per second.

Advantages of a Parallel Database System

• Parallel database systems are very useful for the applications that have to query extremely large

databases (of the order of terabytes, for example, 1012 bytes) or that have to process an extremely

large number of transactions per second (of the order of thousands of transactions per second). •

In a parallel database system, the throughput (that is, the number of tasks that can be completed in

a given time interval) and the response time (that is, the amount of time it takes to complete a

single task from the time it is· submitted) are very high.

Disadvantages of a Parallel Database System

• In a parallel database system, there· is a startup cost associated with initiating a single process

and the startup-time may overshadow the actual processing time, affecting speedup adversely. •

Since process executing in a parallel system often access shared resources, a slowdown may result

from interference of each new process as it completes with existing processes for commonly held

resources, such as shared data storage disks, system bus and so on.

c)Distributed Database System:

A logically interrelated collection of shared data physically distributed over a computer network

is called as distributed database and the software system that permits the management of the

distributed database and makes the distribution transparent to users is called as Distributed
DBMS. It consists of a single logical database that is split into a number of fragments. Each
fragment is stored on one or more computers under the control of a separate DBMS, with the

computers connected by a communications network. As shown, in distributed database system,

data is spread across a variety of different databases. These are managed by a variety of different

DBMS software running on a variety of different operating systems. These machines are spread

(or distributed) geographically and connected together by a variety of communication

networks.

Advantages of Distributed Database System

• Distributed database architecture provides greater efficiency and better performance. • A single

database (on server) can be shared across several distinct client (application) systems. • As data

volumes and transaction rates increase, users can grow the system incrementally. • It causes less

impact on ongoing operations when adding new locations. • Distributed database system provides

local autonomy. Disadvantages of Distributed Database System • Recovery from failure is

more complex in distributed database systems than in centralized systems.

d)Client-Server DBMS:

Client/Server architecture of database system has two logical components namely client, and

server. Clients are generally personal computers or workstations whereas server is large

workstations, mini range computer system or a mainframe computer system. The applications and

tools of DBMS run on one or more client platforms, while the DBMS soft wares reside on the

server. The server computer is caned backend and the client’s computer is called front end. These
server and client computers are connected into a network. The applications and tools act as clients

of the DBMS, making requests for its services. The DBMS, in turn, processes these requests and
returns the results to the client(s). Client/Server architecture handles the Graphical User Interface

(GUI) and does computations and other programming of interest to the end user. The server

handles parts of the job that are common to many clients, for example, database access and updates.

Multi-Tier client server computing models

In a single-tier system the database is centralized, which means the DBMS Software and the data

reside in one location and the dumb terminals were used to access the DBMS as

shown. The rise of

personal computers in businesses during the 1980s, the increased reliability of networking

hardware causes Two-tier and Three-tier systems became common. In a two-tier system, different

software is required for the server and for the client. Illustrates the two-tier client server model. At

the early stages client server computing model was called two-tier-computing model in which

client is considered as data capture and validation tier and Server was considered as data storage

tier. This scenario is depicted.


Problems of two-tier architecture

The need of enterprise scalability challenged this traditional two-tier client-server model. In the

mid-1990s, as application became more complex and could be deployed to hundreds or thousands

of end-users, the client side, now undergoes with following

problems: • A’ fat’
client requiring considerable resources on client’s computer to run effectively. This includes disk

space, RAM and CPU. • Client machines require administration which results overhead.

Three-tier architecture

By 1995, three-tier architecture appears as improvement over two-tier architecture. It has three

layers, which are: • First Layer: User Interface which runs on end-user’s computer (the client) . •

Second Layer: Application Server It is a business logic and data processing layer. This middle

tier runs on a server which is called as Application Server. • Third Layer: Database Server It is a

DBMS, which stores the data required by the middle tier. This tier may run on a separate server

called the database server. As, described earlier, the client is now responsible for application’s user

interface, thus it requires less computational resources now clients are called as ‘thin client’ and it

requires less maintenance.

Advantages of Client/Server Database System

• Client/Server system has less expensive platforms to support applications that had previously

been running only on large and expensive mini or mainframe computers • Client offer icon-based
menu-driven interface, which is superior to the traditional command-line, dumb terminal interface
typical of mini and mainframe computer systems. • Client/Server environment facilitates in more

productive work by the users and making better use of existing

data.

• Client/Server database system is more flexible as compared to the Centralized system. •

Response time and throughput is high. • The server (database) machine can be custom-built

(tailored) to the DBMS function and thus can provide a better DBMS performance. • The client

(application database) might be a personnel workstation, tailored to the needs of the end users and

thus able to provide better interfaces, high availability, faster responses and overall improved ease

of use to the user. + A single database (on server) can be shared across several distinct client

(application) systems.

Disadvantages of Client/Server Database System

• Programming cost is high in client/server environments, particularly in initial phases. • There is

a lack of management tools for diagnosis, performance monitoring and tuning and security control,

for the DBMS, client and operating systems and networking environments.
Database Languages:
A DBMS must provide appropriate languages and interfaces for each category of users to express

database queries and updates. Database Languages are used to create and maintain database

on computer. There are large numbers of database languages like Oracle, MySQL, MS Access,

dBase, FoxPro etc. SQL statements commonly used in Oracle and MS Access can be categorized

as data definition language (DDL), data control language (DCL) and data manipulation language

(DML).

Types
a)Data Definition Language (DDL):

It is a language that allows the users to define data and their relationship to other types of data. It

is mainly used to create files, databases, data dictionary and tables within databases.It is also

used to specify the structure of each table, set of associated values with each attribute, integrity

constraints, security and authorization information for each table and physical storage structure
of each table on disk.

b)Data Manipulation Language (DML):

It is a language that provides a set of operations to support the basic data manipulation operations

on the data held in the databases. It allows users to insert, update, delete and retrieve data from

the database. The part of DML that involves data retrieval is called a query language.

c)Data Control Language (DCL):


DCL statements control access to data and the database using statements such as GRANT and

REVOKE. A privilege can either be granted to a User with the help of GRANT statement. The

privileges assigned can be SELECT, ALTER, DELETE, EXECUTE, INSERT, INDEX etc. In

addition to granting of privileges, you can also revoke (taken back) it by using REVOKE

command.

File Processing System:


File processing systems was an early attempt to computerize the manual filing system that we are

all familiar with. A file system is a method for storing and organizing computer files and the data

they contain to make it easy to find and access them. File systems may use a storage device such

as a hard disk or CD-ROM and involve maintaining the physical location of the files. In our own

home, we probably have some sort of filing system, which contains receipts, guarantees,

invoices, bank statements, and such like. When we need to look something up, we go to the

filing system and search through the system starting from the first entry until we find what we

want. Alternatively, we may have an indexing system that helps to locate what we want more

quickly. For example we may have divisions in the filing system or separate folders for different

types of item that are in some way logically related.The manual filing system works well when
the number of items to be stored is small. It even works quite adequately when there are large

numbers of items and we have only to store and retrieve them. However, the manual filing

system breaks down when we have to cross-reference or process the information in the files. For

example, a typical real estate agent’s office might have a separate file for each property for sale

or rent, each potential buyer and renter, and each member of staff. Clearly the manual system is

inadequate for this’ type of work. The file based system was developed in response to the needs

of industry for more efficient data access. In early processing systems, an organization’s

information was stored as groups of records in separate files.In the traditional approach, we used
to store information in flat files which are maintained by the file system under the operating
system’s control. Here, flat files are files containing records having no structured relationship

among them. The file handling which we learn under C/C ++ is the example of file processing

system. The Application programs written in C/C ++ like programming languages go through the

file system to access these flat.

Characteristics of File Processing System:


• It is a group of files storing data of an organization. • Each file is independent from one

another. • Each file is called a flat file. • Each file contained and processed information for one

specific function, such as accounting or inventory. • Files are designed by using programs

written in programming languages such as COBOL, C, C++. • The physical implementation and

access procedures are written into database application; therefore, physical changes resulted in

intensive rework on the part of the programmer. • As systems became more complex, file

processing systems offered little flexibility, presented many limitations, and were difficult to

maintain.

Limitations of the File Processing System I File-


Based Approach:
1.Separated and Isolated Data: To make a decision, a user might need data from two separate

files. First, the files were evaluated by analysts and programmers to determine the specific data

required from each file and the relationships between the data and then applications could be

written in a programming language to process and extract the needed data. Imagine the work

involved if data from several files was needed.

2.Duplication of data: Often the same information is stored in more than one file. Uncontrolled
duplication of data is not required for several reasons, such as: • Duplication is wasteful. It costs
time and money to enter the data more than once • It takes up additional storage space, again

with associated costs. • Duplication can lead to loss of data integrity; in other words the data is

no longer consistent. For example, consider the duplication of data between the Payroll and

Personnel departments. If a member of staff moves to new house and the change of address is

communicated only to Personnel and not to Payroll, the person’s pay slip will be sent to the

wrong address. A more serious problem occurs if an employee is promoted with an associated

increase in salary. Again, the change is notified to Personnel but the change does not filter

through to Payroll. Now, the employee is receiving the wrong salary. When this error is detected,
it will take time and effort to resolve. Both these examples, illustrate inconsistencies that may

result from the duplication of data. As there is no automatic way for Personnel to update the data

in the Payroll files, it is difficult to foresee such inconsistencies arising. Even if Payroll is

notified of the changes, it is possible that the data will be entered incorrectly.

3. Data Dependence: In file processing systems, files and records were described by specific

physical formats that were coded into the application program by programmers. If the format of a

certain record was changed, the code in each file containing that format must be updated.

Furthermore, instructions for data storage and access were written into the application’s code.

Therefore, .changes in storage structure or access methods could greatly affect the processing or

results of an application. In other words, in file based approach application programs are data
dependent. It means that, with the change in the physical representation (how the data is

physically represented in disk) or access technique (how it is physically accessed) of data,

application programs are also affected and needs modification. In other words application

programs are dependent on the how the data is physically stored and accessed.

4. Difficulty in representing data from the user’s view: To create useful applications for the

user, often data from various files must be combined. In file processing it was difficult to
determine relationships between isolated data in order to meet user requirements.
5. Data Inflexibility: Program-data interdependency and data isolation, limited the flexibility of

file processing systems in providing users with ad-hoc information requests

6. Incompatible file formats: As the structure of files is embedded in the application programs,

the structures are dependent on the application programming language. For example, the

structure of a file generated by a COBOL program may be different from the structure of a file

generated by a ‘C’ program. The direct incompatibility of such files makes them difficult to

process jointly.

7. Data Security. The security of data is low in file based system because, the data is maintained

in the flat file(s) is easily accessible. For Example: Consider the Banking System. The Customer

Transaction file has details about the total available balance of all customers. A Customer wants

information about his account balance. In a file system it is difficult to give the Customer access

to only his data in the· file. Thus enforcing security constraints for the entire file or for certain

data items are difficult.

8. Transactional Problems. The File based system approach does not satisfy transaction

properties like Atomicity, Consistency, Isolation and Durability properties commonly known as

ACID properties.

9. Concurrency problems. When multiple users access the same piece of data at same interval

of time then it is called as concurrency of the system. When two or more users read the data

simultaneously there is ll( problem, but when they like to update a file simultaneously, it may

result in a problem.

10. Poor data modeling of real world. The file based system is not able to represent the

complex data and interfile relationships, which results poor data modeling properties.
Entity-Relationship Model:
In Entity-Relationship model a database is modeled as a collection of entities and relationship

among entities. The ER model views the real world as a construct of entities and association

between entities.

Entities
An entity is an object whose information is stored in the database. It is distinguishable from other

objects. For example: specific person, company, event, plant. In other words, any thing that may

‘have an independent existence and about which we intend to collect data is known as Entity. It

is also known as Entity type. Entities are the principal data object about which information is to

be collected. Entities are usually recognizable concepts, either concrete or abstract, such as

person, places, things, or events, which have relevance to the database. Some specific examples

of entities are EMPLOYEES, PROJECTS, and INVOICES. An entity is analogous to a table in

the relational model. Entities are classified as independent or dependent (in some methodologies,

the terms used are strong and weak, respectively). An independent entity is one that does not rely

on another for identification. A dependent entity is one that relies on another for identification.
An entity occurrence (also called an instance) is an individual occurrence of an entity. An

occurrence is analogous to a row in the relational table.

Entity Set
An entity set is a set of entities of the same type that share the same properties. For example: set

of all persons, companies, trees, holidays. In object oriented terminology entity is considered as
an object and entity set is considered as a class. For example, Student is an entity set or class and

a particular student is an entity or an object.

Special Entity Types


Associative entities (also known as intersection entities) are entities used to associate two or

more entities in order to reconcile a many-to-many relationship. Subtypes entities are used in

generalization hierarchies to represent a subset of instances of their parent entity, called the super

type, but which have attributes or relationships that apply only to the subset. Associative entities

and generalization hierarchies are discussed in more detail in next sections.

Relationships
A Relationship represents an association between two or more entities. Relationships are

classified in terms of degree, connectivity, cardinality, and existence. An example of a

relationship would be: • Employees are assigned to projects • Projects have subtasks •

Departments manage one or more projects

Attributes
Attributes describe the properties of the entity of which they are associated. A particular instance

of an attribute is a value. For example, “Ram” is one value of the attribute Name. The domain of

an attribute is the collection of all possible values an attribute can have. The domain of Name is a

character string.

Attribute
We can classify attributes as following types: • Simple • Composite • Single-values • Multi-

values • DerivedSimple Attribute: A simple attribute is an attribute composed of a single

component with an independent existence. Simple attributes cannot be further subdivided.

Examples of simple attributes include Sex, Age, and Salary etc. Simple attributes are sometimes

called atomic attributes. Composite Attribute: An attribute composed of multiple components,

each with an independent existence is called a composite attribute. Some attributes can be further

divided to yield smaller components with an independent existence of their own. For example,

the Address attribute can be composed of components like Street number, Area, City, Pin code
and so on. The decision to model the Address, Area, and City etc. is dependent on whether the

user view of the model refers to the Address attribute as a single unit or as individual

components. Composite attributes are

shown.

Si

ngle-valued Attribute: A single-valued attribute is one that holds a single value for a single

entity. The majority of attributes are single-valued for a particular entity. For example, the

Classroom entity has as single value for the room_number attribute and therefore the

room_number attribute is referred to as being single-valued. Multi-valued Attribute: A multi-

valued attribute is one that holds multiple values for a single entity. Some attribute has multiple

values for a particular entity. For example, a student entity can have multiple values for the
Hobby attribute-reading, music, movies and so on. A multi-valued attribute may have set of
numbers with upper and lower limits. For example, the Hobby attribute of a Student may have

between one and five values. In other words, a student may have a minimum of one hobby and

maximum of 5 hobbies. Derived Attribute: A derived attribute is one that represents a value

that is derivable from the value of a related attribute or set of attributes, not necessarily in the

same entity. Some attributes may be related for a particular entity. For example the Age attribute

can be derived from the date-of-birth attribute and therefore they are related. We refer the age

attribute as a derived attribute, the value of which is derived from the date-of-birth

attribute.

Degreeof a Relationship
The degree of a relationship is the number of entities associated with the relationship. The n-

ary relationship is the general form for degree n. Special cases are the binary, and ternary, where

the degree is 2, and 3, respectively. Binary relationships, the association between two entities are

the most common type in the real world. A recursive binary relationship occurs when an entity is

related to itself. An example might be “some employees are married to other employees”. A

ternary relationship involves three entities and is used when a binary relationship is inadequate.

Many modeling approaches recognize only binary relationships. Ternary orn-ary relationships

are decomposed into two or more binary relationships.


Connectivity and Cardinality
The connectivity of a relationship describes the mapping of associated entity instances in the

relationship. The values of connectivity are “one” or “many”. The cardinality of a relationship is

the actual number of related occurrences for each of the two entities. The basic types of

connectivity for relations are: • One to One (1: 1). • One to Many (1:M) • Many to One (M:l) •

Many to Many (M:M) A one-to-one (1:1) relationship is when at most one instance of an entity

A is associated with one instance of entity B. For example, “employees in the company are each

assigned their own office. For each employee there exists a unique office and for each office

there exists a unique employee. A one-to-many (1:M) relationships is when for one instance of

entity A, there are zero, one, or many instances of entity B, but for one instance of entity B, there

is only one instance of entity A. Examples of l:M relationships are: • A department has many

employees. • Each employee is assigned to one department. A many-to-one (M: 1) relationships

is when for one instance of entity A is associated with at most one instances of entity B, but for

one instance of entity B, there may be any number of instances of entity A. Examples of M: 1

relationships is • Many employees one department. A many-to-many (M: M) relationship,

sometimes called non-specific, is when for one instance of entity A, there are zero, one, or many

instances of entity B and for one instance of entity B there are zero, one, or many instances of
entity A. Examples are: • Employees can be assigned to no more than two projects at the same

time. • Projects must have assigned at least three employees. A single employee can be assigned

to many projects; conversely, a single project can have assigned to it many employees. Here, the

cardinality for the relationship between employees and projects is two and the cardinality

between project and employee is three. Many-to-many relationships cannot be directly translated

to relational tables but instead must be transformed into two or more one-to-many relationships

using associative entities.

Direction
The direction of a relationship indicates the originating entity of a relationship. The entity from

which a relationship originates is the parent entity; the entity where the relationship terminates is

the child entity. The type of the relation is determined by the direction of line connecting

relationship component and the entity. To distinguish different types of relation, we draw either a

directed line or an undirected line between the relationship set and the entity set. Directed line is

used to indicate one occurrence and undirected line is used to indicate many occurrences in a

relation as shown in next case. To illustrate these different types of relationships consider the

following entity sets: DEPARTMENT, MANAGER, EMPLOYEE, PROJECT. The


relationship between a DEPARTMENT and a MANAGER is usually one-to-one; there is only

one manager per department and a manager manages only one department. This relationship

between entities is shown in figure 4.3. Each entity is represent by a rectangle and a direct line

indicates the relationship between them. The relationship for MANAGER to DEPARTMENT

and from DEPARTMENT to MANAGER is both 1:1. Note that a one to one relationship

between two entity set does not imply that for an occurrence of an entity from one set at any time

there must be an occurrence of an entity in the other set. In the case of an organization, there

could be times when a department is without a manager or when an employee who is classified

as a manager may be without a department to manage. Some instance of one to one relationship

between the entities DEP ARTMENT and

MANAGER.
A one to many,

relationship exists from the entity MANAGER to the entity EMPLOYEE because there are

several employees reporting to the manager. As we just pointed out, there could be an occurrence

of the entity type MANAGER having zero occurrences of the entity type EMPLOYEE reporting

to him or her. A reverse relationship, from EMPLOYEE to MANAGER, would be many to one,

since a single manager may supervise many employees However, given an instance of the entity

set EMPLOYEE, there could be only one instance of the entity set MANAGER to whom that

employee reports (assuming that no employee reports to more than one manager). The

relationship between entities is illustrated and figure shows some instances of this

relationship.

The relationship

between the entity EMPLOYEE and the entity PROJECT can be derived as follows: Each

employee could be involved in a number of different projects, and a number of employees could
be working on a given project. This relationship Between EMPLOYEE and PROJECT is many
to many. It is illustrated and shows some instances of such a

relationship.

Entity Relationship Diagram:

1. Gathering information: This could be a written document that describes the system in question

with reasonable amount of details.

2. Producing ERD: ERD or Entity Relationship Diagram is a diagrammatic representation of the

description we have gathered about the system.

3. Designing the database: Out of the ERD we have created, it is very easy to determine the tables,

the attributes which the tables must contain and the relationship among these tables.

4. Normalization: This is a process of removing different kinds of impurities from the tables we

have just created in the above step.


How to Prepare an ERD

Step 1 Let us take a very simple example and we try to reach a fully organized database from it.

Let us look at the following simple statement: A boy eats an ice cream. This is a description of a

real word activity, and we may consider the above statement as a written document (very short, of

course).

Step 2 Now we have to prepare the ERD. Before doing that we have to process the statement a

little. We can see that the sentence contains a subject (boy), an object (ice cream) and a verb (eats)

that defines the relationship between the subject and the object. Consider the nouns as entities

(boy and ice cream) and the verb (eats) as a relationship. To plot them in the diagram, put the

nouns within rectangles and the relationship within a diamond. Also, show the relationship with a

directed arrow, starting from the subject entity (boy) towards the object entity (ice

cream).

Well, fine. Up to this point the ERD shows how boy and ice cream are related. Now, every boy

must have a name, address, phone number etc. and every ice cream has a manufacturer, flavor,

price etc. Without these the diagram is not complete. These items which we mentioned here are

known as attributes, and they must be incorporated in the ERD as connected

ovals.
But can only entities have attributes? Certainly not. If we want then the relationship must have

their attributes too. These attribute do not inform anything more either about the boy or the ice

cream, but they provide additional information about the relationships between the boy and the ice

cream.

Step 3 We are almost complete now. If you look carefully, we now have defined structures for at

least three tables like the


following:

However, this is still not a working database, because by

definition, database should be “collection of related tables.” To make them connected, the tables

must have some common attributes. If we chose the attribute Name of the Boy table to play the

role of the common attribute, then the revised structure of the above tables become something like

the following. This is as complete

as it can be. We now have information about the boy, about the ice cream he has eaten and about

the date and time when the eating was done.


Cardinality of Relationship While creating relationship between two entities, we may often need

to face the cardinality problem. This simply means that how many entities of the first set are related

to how many entities of the second set. Cardinality can be of the following three types.

One-to-One Only one entity of the first set is related to only one entity of the second set. E.g. A

teacher teaches a student. Only one teacher is teaching only one student. This can be expressed in

the followingdiagram

as:

One-to-Many Only one entity of the first set is related to multiple entities of the second set.

E.g. A teacher teaches students. Only one teacher is teaching many students. This can be expressed

in the followingdiagram

as:

Many-to-One Multiple entities of the first set are related to multiple entities of the second set.

E.g. Teachers teach a student. Many teachers are teaching only one student. This can be expressed

in the followingdiagram

as:
Many-to-Many Multiple entities of the first set is related to multiple entities of the second set.

E.g. Teachers teach students. In any school or college many teachers are teaching many students.

This can be consideredas a two way one-to-many relationship. This can be expressed in the

following diagram

as: In

this discussion we have not included the attributes, but you can understand that they can be used

without any problem if we want to.

The Concept of Keys

A key is an attribute of a table which helps to identify a row. There can be many different types of

keys which are explained here.

Super Key or Candidate Key: It is such an attribute of a table that can uniquely identify a row in

a table. Generally they contain unique values and can never contain NULL values. There can be

more than one super key or candidate key in a table e.g. within a STUDENT table Roll and Mobile
No. can both serve to uniquely identify a student.

Primary Key: It is one of the candidate keys that are chosen to be the identifying key for the entire

table. E.g. although there are two candidate keys in the STUDENT table, the college would

obviously use Roll as the primary key of the table.

Alternate Key: This is the candidate key which is not chosen as the primary key of the table. They

are named so because although not the primary key, they can still identify a row.
Composite Key: Sometimes one key is not enough to uniquely identify a row. E.g. in a single

class Roll is enough to find a student, but in the entire school, merely searching by the Roll is not

enough, because there could be 10 classes in the school and each one of them may contain a certain

roll no 5. To uniquely identify the student we have to say something like “class VII, roll no 5”. So,

a combination of two or more attributes is combined to create a unique combination of values,

such as Class + Roll.

Foreign Key: Sometimes we may have to work with an attribute that does not have a primary key

of its own. To identify its rows, we have to use the primary attribute of a related table. Such a copy

of another related table’s primary key is called foreign key.

Strong and Weak Entity Based on the concept of foreign key, there may arise a situation when

we have to relate an entity having a primary key of its own and an entity not having a primary key

of its own. In such a case, the entity having its own primary key is called a strong entity and the

entity not having its own primary key is called a weak entity. Whenever we need to relate a strong

and a weak entity together, the ERD would change just a little. Say, for example, we have a

statement “A Student lives in a Home.” STUDENT is obviously a strong entity having a primary

key Roll. But HOME may not have a unique primary key, as its only attribute Address may be

shared by many homes (what if it is a housing estate?). HOME is a weak entity in this case. The

ERD of this statement would be like the

following

As you can see, the weak entity itself and the relationship linking a strong and weak entity must
have double border.
Different Types of Database

There are three different types of data base. The difference lies in the organization of the database

and the storage structure of the data. We shall briefly mention them here.

Relational DBMS

This is our subject of study. A DBMS is relational if the data is organized into relations, that is,

tables. In RDBMS, all data are stored in the well-known row-column format.

Hierarchical DBMS

In HDBMS, data is organized in a tree like manner. There is a parent-child relationship among

data items and the data model is very suitable for representing one-to-many relationship. To access

the data items, some kind of tree-traversal techniques are used, such as preorder traversal. Because

HDBMS is built on the one-to-many model, we have to face a little bit of difficulty to organize a

hierarchical database into row column format. For example, consider the following hierarchical

database that shows four employees (E01, E02, E03, and E04) belonging to the same department

D1. There are two ways to

represent the above one-to-many information into a relation that is built in one-to-one relationship.

The first is called Replication, where the department id is replicated a number of times in the table
like the following. Replication makes the same data

item redundant and is an inefficient way to store data. A better way is to use a technique called

the Virtual Record. While using this, the repeating data item is not used in the table. It is kept at

a separate place. The table, instead of containing the repeating information, contains a pointer to

that place where the data item is

stored. This

organization saves a lot of space as data is not made redundant. Network DBMS The NDBMS
is built primarily on a one–to-many relationship, but where a parent-child representation among

the data items cannot be ensured. This may happen in any real world situation where any entity

can be linked to any entity. The NDBMS was proposed by a group of theorists known as

the Database Task Group (DBTG). What they said looks like this: In NDBMS, all entities are

called Records and all relationships are called Sets. The record from where the relationship starts

is called the Owner Record and where it ends is called Member Record. The relationship or set

is strictly one-to-
many. In case

we need to represent a many-to-many relationship, an interesting thing happens. In NDBMS,

Owner and Member can only have one-to-many relationship. We have to introduce a third common

record with which both the Owner and Member can have one-to-many relationship. Using this

common record, the Owner and Member can be linked by a many-to-many relationship. Suppose

we have to represent the statement Teachers teach students. We have to introduce a third record,

suppose CLASS to which both teacher and the student can have a many-to-many relationship.

Using the class in the middle, teacher and student can be linked to a virtual many-to many

relationship.

You might also like