UNIT 1: Database System Concepts & Architecture: o o o o o o

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

1

UNIT 1: Database System Concepts & Architecture


What is DBMS?
Database Management Systems (DBMS) are software systems used to store, retrieve, and run queries on
data. A DBMS serves as an interface between an end-user and a database, allowing users to create, read,
update, and delete data in the database.
DBMS manage the data, the database engine, and the database schema, allowing for data to be
manipulated or extracted by users and other programs. This helps provide data security, data integrity,
concurrency, and uniform data administration procedures.
Advantages of DBMS
o Controls database redundancy: It can control data redundancy because it stores all the data in
one single database file and that recorded data is placed in the database.
o Data sharing: In DBMS, the authorized users of an organization can share the data among multiple
users.
o Easily Maintenance: It can be easily maintainable due to the centralized nature of the database
system.
o Reduce time: It reduces development time and maintenance need.
o Backup: It provides backup and recovery subsystems which create automatic backup of data
from hardware and software failures and restores the data if required.
o multiple user interface: It provides different types of user interfaces like graphical user interfaces,
application program interfaces

Data Abstraction is a process of hiding unwanted or irrelevant details from the end user. It provides a
different view and helps in achieving data independence which is used to enhance the security of data.
The database systems consist of complicated data structures and relations. For users to access the data
easily, these complications are kept hidden, and only the relevant part of the database is made accessible
to the users through data abstraction.
Database Language
o A DBMS has appropriate languages and interfaces to
express database queries and updates.
o Database languages can be used to read, store and
update the data in the database.
Types of Database Language
1. Data Definition Language
o DDL stands for Data Definition Language. It is used to define database structure or pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the number of tables and
schemas, their names, indexes, columns in each table, constraints, etc.
Here are some tasks that come under DDL:
o Create: It is used to create objects in the database.
o Alter: It is used to alter the structure of the database.
o Drop: It is used to delete objects from the database.
o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.
o Comment: It is used to comment on the data dictionary.
2. Data Manipulation Language
DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a database.
It handles user requests.
Here are some tasks that come under DML:
o Select: It is used to retrieve data from a database.
o Insert: It is used to insert data into a table.
o Update: It is used to update existing data within a table.
o Delete: It is used to delete all records from a table.
o Merge: It performs UPSERT operation, i.e., insert or update operations.
o Call: It is used to call a structured query language or a Java subprogram.
o Explain Plan: It has the parameter of explaining data.
2
o Lock Table: It controls concurrency.
3. Data Control Language
o DCL stands for Data Control Language. It is used to retrieve the stored or saved data.
o The DCL execution is transactional. It also has rollback parameters.
(But in Oracle database, the execution of data control language does not have the feature of rolling back.)
Here are some tasks that come under DCL:
o Grant: It is used to give user access privileges to a database.
o Revoke: It is used to take back permissions from the user.
There are the following operations which have the authorization of Revoke:
CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.
4. Transaction Control Language
TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical
transaction.
Here are some tasks that come under TCL:
o Commit: It is used to save the transaction on the database.
o Rollback: It is used to restore the database to original since the last Commit.

What is Data Independence in DBMS?


Data independence is the ability to modify the scheme without affecting the programs and the application to
be rewritten. Data is separated from the programs, so that the changes made to the data will not affect the
program execution and the application.
There are two levels of data independence based on three levels of abstraction. These are as follows –
Physical Data Independence
Physical Data Independence means changing the physical level without affecting the logical level or
conceptual level. Using this property, we can change the storage device of the database without affecting
the logical schema.
Logical Data Independence
Logical view of data is the user view of the data. It presents data in the form that can be accessed by the
end users.

Three-level ANSI SPARC Database Architecture


The Architecture of most of commercial DBMS are available today is mostly based on this ANSI-SPARC
database architecture.
ANSI SPARC THREE-TIER architecture has main three levels:
These three levels provide data abstraction; means hide the low-level complexities from end users.
A database system should be efficient in performance and convenient in use.
Using these three levels, it is possible to use complex structures at internal level for efficient operations and
to provide simpler convenient interface at external level.
External level:
It is the view how the user views the database. The data of the database that is relevant to that user is
described at this level. The external level consists of several different external views of the database. In the
external view only those entities, attributes, and relationships are included that the user wants. The different
views may have different ways of representing the same data. For example, one user may view name in
the form (first name, last name), while another may view as (last name, first name).
Conceptual level:
It is the community view of the database and describes what data is stored in the database and represents
the entities, their attributes, and their relationships. It represents the semantic, security, and integrity
information about the data. The middle-level or the second-level in the three-level architecture is the
conceptual level. This level contains the logical structure of the entire database, it represents the complete
view of the database that the organization demands independent of any storage consideration.
Internal level:
At the internal level, the database is represented physically on the computer. It emphasizes the physical
implementation of the database to do storage space utilization and to achieve the optimal runtime
performance, and data encryption techniques. It interfaces with the operating system to place the data on
storage files and build the storage space, retrieve the data, etc.
3
Components of DBMS
The database management system can be divided into five major components, they are:
Hardware
When we say Hardware, we mean computer, hard disks, I/O channels for data, and any other physical
component involved before any data is successfully stored into the memory.
When we run Oracle or MySQL on our personal computer, then our computer's Hard Disk, our Keyboard
using which we type in all the commands, our computer's RAM, ROM all become a part of the DBMS
hardware.
Software
This is the main component, as this is the program which controls everything. The DBMS software is more
like a wrapper around the physical database, which provides us with an easy-to-use interface to store,
access and update data.
The DBMS software is capable of understanding the Database Access Language and interpret it into actual
database commands to execute them on the DB.
Data: Data is that resource, for which DBMS was designed. The motive behind the creation of DBMS was
to store and utilise data.
In a typical Database, the user saved Data is present and meta data is stored.
Metadata is data about the data. This is information stored by the DBMS to better understand the data
stored in it.
For example: When I store my Name in a database, the DBMS will store when the name was stored in the
database, what is the size of the name, is it stored as related data to some other data, or is it independent,
all this information is metadata.
Procedures
Procedures refer to general instructions to use a database management system. This includes procedures
to setup and install a DBMS, to login and logout of DBMS software, to manage databases, to take backups,
generating reports etc.
Database Access Language
Database Access Language is a simple language designed to write commands to access, insert, update
and delete data stored in any database.
A user can write commands in the Database Access Language and submit it to the DBMS for execution,
which is then translated and executed by the DBMS.
User can create new databases, tables, insert data, fetch stored data, update data and delete the data
using the access language.

Functions and responsibilities of DBAs


1. Schema Definition:
 The DBA definition the logical Schema of the database. A Schema refers to the overall logical
structure of the database.
 According to this schema, database will be developed to store required data for an organization.
2. Storage Structure and Access Method Definition:
 The DBA decides how the data is to be represented in the stored database.
3. Assisting Application Programmers:
 The DBA provides assistance to application programmers to develop application programs.
4. Physical Organization Modification:
 The DBA modifies the physical organization of the database to reflext the changing needs of the
organization or to improve performance.
5. Approving Data Access:
 The DBA determines which user needs access to which part of the database.
 According to this, various types of authorizations are granted to different users.
6. Monitoring Performance:
 The DBA monitors performance of the system. The DBA ensures that better performance is
maintained by making changes in physical or logical schema if required.
7. Backup and Recovery:
 Database should not be lost or damaged.
 The DBA ensures this periodically backing up the database on magnetic tapes or remote servers.
 In case of failure, such as virus attack database is recovered from this backup.
4
 Role of DBA
 Creating and maintaining database standards and policies
 Supporting database design, creation, and testing activities
 Managing the database availability and performance, including incident and problem management
 Administering database objects to achieve optimum utilization
 Defining and implementing event triggers that will alert on potential database performance or
integrity issues
 Performing database housekeeping, such as tuning, indexing, etc.
 Monitoring usage, transaction volumes, response times, concurrency levels, etc.

What Does Data Dictionary Mean?


A data dictionary is a file or a set of files that contains a database's metadata. The data dictionary contains
records about other objects in the database, such as data ownership, data relationships to other objects,
and other data.
Types of Data Dictionary
Here are the two types of data dictionary −
Active Data Dictionary
The DBMS software manages the active data dictionary automatically. The modification is an automatic
task and most RDBMS has active data dictionary. It is also known as integrated data dictionary.
Passive Data Dictionary
Managed by the users and is modified manually when the database structure change. Also known as non-
integrated data dictionary

UNIT 2: Data Modelling Technique


Data Models are fundamental entities to introduce abstraction in a DBMS. Data models define how data is
connected to each other and how they are processed and stored inside the system.
The very first data model could be flat data-models, where all the data used are to be kept in the same
plane.

Types of Data Model


Hierarchical Model
The hierarchical data model is one of the oldest data models, developed in the 1950s by IBM. In this data
model, the data is organized in a hierarchical tree-like structure. This data model can be easily visualized
because each record has one parent and many children (possibly 0) as shown in the image given below.

The above given image represents the data model of the


Vehicle database, vehicle is classified into two types Viz.
two-wheelers and four-wheelers and then they are further
classified. The main drawback we can see here is we can
only have one to many relationships under this model, hence
the hierarchical data model is very rarely used nowadays

Network Model
A network model is nothing but a generalization of the hierarchical data model as this data model allows
many to many relationships therefore in this model a record can also have more than one parent.
5
The network model can be represented as a graph and hence it replaces the hierarchical tree with a graph
in which object types are the nodes and relationships are the edges.
For example -

Here you can see all the three departments are linked with
the director which was not possible in the hierarchical data
model. In the network model, there can be many possible
paths to reach a node from the root node (College is the
root node in the above case), therefore the data can be
accessed efficiently when compared to the hierarchical
data model. But, on the other hand, the process of insertion
and deletion of data is quite complex.
Object-Oriented Data model
As suggested by its name, the object-oriented data model is a combination of object-oriented programming,
and relational data model. In this data model, the data and their relationship are represented in a single
structure which is known as an object. Since data is stored as objects, we can easily store audio, video,
images, etc in the database which was very difficult and inconvenient to do in the relational model. As
shown in the image below two objects are connected with each other through links.

In the above image, we have two objects that are


Employee and Department in which all the data is
contained in a single unit (object). They are linked
with each other as they share a common attribute
i.e.i.e. Department_Id.

Entity-Relationship model (ER Model)


An Entity-Relationship model is a high-level data model that describes the structure of the database in a
pictorial form which is known as ER-diagram. In simple words, an ER diagram are used to represent logical
structure of the database easily. ER model develops a conceptual view of the data hence it can be used as
a blueprint to implement the database in the future.
Developers can easily understand the system just by looking at ER diagram. Let's first have a look at the
components of an ER diagram.
Entity - Anything that has an independent existence
about which we collect the data. They are represented
as rectangles in the ER diagram. For example - Car,
house, employee.
Entity Set - A set of the same type of entities is known
as an entity set. For example - Set of students studying
in a college.
Attributes - Properties that define entities are called
attributes. They are represented by an ellipse shape.
Relationships - A relationship is used to describe the
association between entities. They are represented as
diamond or rhombus shapes in the ER diagram.
In the above-represented ER diagram, we have two entities that are Employee and
Company and the relationship among them. Also, in the above-represented ER
diagram, we can see that both employee and company have some attributes and the
relationship is of "works in" type, which means employee works in a company.
6
Enhanced Entity Relationship Model (EER Model)
EER is a high-level data model that incorporates the extensions to the original ER model.
Features of EER Model
 EER creates a design more accurate to database schemas.
 It reflects the data properties and constraints more precisely.
 It includes all modelling concepts of the ER model.
 Diagrammatic technique helps for displaying the EER schema.
 It includes the concept of specialization and generalization.
 It is used to represent a collection of objects that is union of objects of different of different entity
types.
Generalization
 Generalization is the process of generalizing the entities which contain the properties of all the
generalized entities.
 It is a bottom approach, in which two lower-level entities combine to form
a higher-level entity.
 Generalization is the reverse process of Specialization.
 It defines a general entity type from a set of specialized entity type.
 It minimizes the difference between the entities by identifying the
common features.
For example:
In the above example, Tiger, Lion, Elephant can all be generalized as
Animals.
Specialization
 Specialization is a process that defines a group entity which is divided into sub groups based on
their characteristic.
 It is a top-down approach, in which one higher entity can be broken down into two lower-level entity.
 It maximizes the difference between the members of an entity by
identifying the unique characteristic or attributes of each member.
 It defines one or more sub class for the super class and also forms
the superclass/subclass relationship.
For example
In the above example, Employee can be specialized as Developer or Tester,
based on what role they play in an organization.

Aggregation
 Aggregation is a process that represent a relationship between a whole
object and its component parts.
 It abstracts a relationship between objects and viewing the relationship
as an object.
 It is a process when two entity is treated as a single entity.

In the above example, the relation between College and Course is


acting as an Entity in Relation with Student.
7
UNIT 3: Relational Data Model in Depth

What is the Relational Model?


The relational model for database management is an approach to logically represent and manage the data
stored in a database. In this model, the data is organized into a collection of two-dimensional inter-related
tables, also known as relations. Each relation is a collection of columns and rows, where the column
represents the attributes of an entity and the rows (or tuples) represents the records.
Relational Model Concepts
As discussed earlier, a relational database is based on the relational model. This database consists of
various components based on the relational model. These include:
 Relation: Two-dimensional table used to store a collection of data elements.
 Tuple: Row of the relation, depicting a real-world entity.
 Attribute/Field: Column of the relation, depicting properties that define the relation.
 Attribute Domain: Set of pre-defined atomic values that an attribute can take i.e., it describes the
legal values that an attribute can take.
 Degree: It is the total number of attributes present in the relation.
 Cardinality: It specifies the number of entities involved in the relation i.e., it is the total number of
rows present in the relation.
 Relational Schema: It is the logical blueprint of the relation i.e.; it describes the design and the
structure of the relation. It contains the table name, its attributes, and their types:
TABLE_NAME (ATTRIBUTE_1 TYPE_1, ATTRIBUTE_2 TYPE_2, ...)
For our student relation example, the relational schema will be:
STUDENT (ROLL_NUMBER INTEGER, NAME VARCHAR (20), CGPA FLOAT)
 Relational Instance: It is the collection of records present in the relation at a given time.
 Relation Key: It is an attribute or a group of attributes that can be used to uniquely identify an entity
in a table or to determine the relationship between two tables. Relation keys can be of 6 different
types:
1. Candidate Key
2. Super Key
3. Composite Key
4. Primary Key
5. Alternate Key
6. Foreign Key

1. Primary key
o It is the first key used to identify one and only one instance of an entity uniquely. An entity can
contain multiple keys, as we saw in the PERSON table. The key which is most suitable from those
lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the
EMPLOYEE table, we can even select License Number and Passport Number as primary keys
since they are also unique.
o For each entity, the primary key selection is based on requirements and developers.
8
2. Candidate key
o A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
o Except for the primary key, the remaining attributes are considered a candidate key. The candidate
keys are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes, like
SSN, Passport Number, License Number, etc., are considered a candidate key.

3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a candidate key.

For example: In the above EMPLOYEE table, for


(EMPLOEE_ID, EMPLOYEE_NAME), the name of
two employees can be the same, but their EMPLYEE
_ID can't be the same. Hence, this combination can
also be a key.

The super key would be EMPLOYEE-ID


(EMPLOYEE_ID, EMPLOYEE-NAME), etc.

4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of another table.
o Every employee works in a specific department in a company, and employee and department are
two different entities. So we can't store the department's information in the employee table. That's
why we link these two tables through the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the
EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
9
Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain the quality of information.
o Integrity constraints ensure that the data insertion, updating, and other processes have to be
performed in such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against accidental damage to the database.
Constraints that are applied in the data model is called Implicit constraints.
Constraints that are directly applied in the schemas of the data model, by specifying them in the DDL (Data
Definition Language). These are called as schema-based constraints or Explicit constraints.
Types of Integrity Constraint
1. Domain constraints
o Domain constraints can be defined as the definition of a valid set of values for an attribute.
o The data type of domain includes string, character, integer, time, date, currency, etc. The value of
the attribute must be available in the corresponding domain.
Example:

2. Entity integrity constraints


o The entity integrity constraint states that primary key value can't be null.
o This is because the primary key value is used to identify individual rows in relation and if the primary
key has a null value, then we can't identify those rows.
o A table can contain a null value other than the primary key field.
Example:

3. Referential Integrity Constraints


o A referential integrity constraint is specified between two tables.
o In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of Table
2, then every value of the Foreign Key in Table 1 must be null or be available in Table 2.
Example:
10
4. Key constraints
o Keys are the entity set that is used to identify an entity within its entity set uniquely.
o An entity set can have multiple keys, but out of which one key will be the primary key. A primary key
can contain a unique and null value in the relational table.
Example:

What is database security


Database security refers to the range of tools, controls, and measures designed to establish and preserve
database confidentiality, integrity, and availability. This article will focus primarily on confidentiality since it’s
the element that’s compromised in most data breaches.
Database security must address and protect the following:
 The data in the database
 The database management system (DBMS)
 Any associated applications
 The physical database server and/or the virtual database server and the underlying hardware
 The computing and/or network infrastructure used to access the database
Database security is a complex and challenging endeavour that involves all aspects of information security
technologies and practices. It’s also naturally at odds with database usability. The more accessible and
usable the database, the more vulnerable it is to security threats; the more invulnerable the database is to
threats, the more difficult it is to access and use. (This paradox is sometimes referred to as Anderson’s
Rule. (link resides outside IBM)

Authorization
Authorization is a privilege provided by the Database Administer. Users of the database can only view the
contents they are authorized to view. The rest of the database is out of bounds to them.
The different permissions for authorizations available are:
 Primary Permission - This is granted to users publicly and directly.
 Secondary Permission - This is granted to groups and automatically awarded to a user if he is a
member of the group.
 Public Permission - This is publicly granted to all the users.
 Context sensitive permission - This is related to sensitive content and only granted to a select
users.
The categories of authorization that can be given to users are:
 System Administrator - This is the highest administrative authorization for a user. Users with this
authorization can also execute some database administrator commands such as restore or upgrade
a database.
 System Control - This is the highest control authorization for a user. This allows maintenance
operations on the database but not direct access to data.
 System Maintenance - This is the lower level of system control authority. It also allows users to
maintain the database but within a database manager instance.
 System Monitor - Using this authority, the user can monitor the database and take snapshots of it.

Database Integrity
Data integrity in the database is the correctness, consistency and completeness of data. Data integrity is
enforced using the following three integrity constraints:
 Entity Integrity - This is related to the concept of primary keys. All tables should have their own
primary keys which should uniquely identify a row and not be NULL.
11
 Referential Integrity - This is related to the concept of foreign keys. A foreign key is a key of a
relation that is referred in another relation.
 Domain Integrity - This means that there should be a defined domain for all the columns in a
database.

Availability
In order for an information system to be useful it must be available to authorized users. Availability
measures protect timely and uninterrupted access to the system. Some of the most fundamental threats to
availability are non-malicious in nature and include hardware failures, unscheduled software downtime and
network bandwidth issues. Malicious attacks include various forms of sabotage intended to cause harm to
an organization by denying users access to the information system.
The availability and responsiveness of a website is a high priority for many business. Disruption of website
availability for even a short time can lead to loss of revenue, customer dissatisfaction and reputation
damage. The Denial of Service (DoS) attack is a method frequently used by hackers to disrupt web
service. In a DoS attack, hackers flood a server with superfluous requests, overwhelming the server and
degrading service for legitimate users. Over the years, service providers have developed sophisticated
countermeasures for detecting and protecting against DoS attacks, but hackers also continue to gain in
sophistication and such attacks remain an ongoing concern.
Availability countermeasures to protect system availability are as far ranging as the threats to
availability. Systems that have a high requirement for continuous uptime should have significant hardware
redundancy with backup servers and data storage immediately available. For large, enterprise systems it is
common to have redundant systems in separate physical locations. Software tools should be in place to
monitor system performance and network traffic. Countermeasures to protect against DoS attacks include
firewalls and routers.

Relational Algebra
Relational algebra is a procedural query language, which takes instances of relations as input and yields
instances of relations as output. It uses operators to perform queries. An operator can be either unary or
binary. They accept relations as their input and yield relations as their output. Relational algebra is
performed recursively on a relation and intermediate results are also considered relations.
The fundamental operations of relational algebra are as follows -
 Select
 Project
 Union
 Set different
 Cartesian product
 Rename
Relational Calculus
In contrast to Relational Algebra, Relational Calculus is a non-procedural query language, that is, it tells
what to do but never explains how to do it.
Relational calculus exists in two forms -
 Tuple Relational Calculus (TRC)
 Domain Relational Calculus (DRC)
12
Following are some of the important differences between Relational Algebra and Relational Calculus.
Sr. No. Key Relational Algebra Relational Calculus

Language Relational Algebra is procedural query Relational Calculus is a non-


1 Type language. procedural or declarative query
language.

Objective Relational Algebra targets how to obtain Relational Calculus targets what result
2
the result. to obtain.

Order Relational Algebra specifies the order in Relational Calculus specifies no such
3
which operations are to be performed. order of executions for its operations.

Dependency Relational Algebra is domain Relational Calculus can be domain


4
independent. dependent.

Programming Relational Algebra is close to Relational Calculus is not related to


5
Language programming language concepts. programming language concepts.
13
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
SQL
o SQL stands for Structured Query Language. It is used for storing and managing data in relational
database management system (RDMS).
o It is a standard language for Relational Database System. It enables a user to create, read, update
and delete relational databases and tables.
Rules:
SQL follows the following rules:
 Structure query language is not case sensitive. Generally, keywords of SQL are written in
uppercase.
 Statements of SQL are dependent on text lines. We can use a single SQL statement on one or
multiple text line.
 Using the SQL statements, you can perform most of the actions in a database.
 SQL depends on tuple relational calculus and relational algebra.

Advantages of SQL
There are the following advantages of SQL:
High speed
Using the SQL queries, the user can quickly and efficiently retrieve a large amount of records from a
database.
No coding needed
In the standard SQL, it is very easy to manage the database system. It doesn't require a substantial amount
of code to manage the database system.
Well defined standards
Long established are used by the SQL databases that are being used by ISO and ANSI.
Portability
SQL can be used in laptop, PCs, server and even some mobile phones.
Interactive language
SQL is a domain language used to communicate with the database. It is also used to receive answers to
the complex questions in seconds.
Multiple data view
Using the SQL language, the users can make different views of the database structure.

Create Table
The CREATE TABLE statement is used to create a new table in a database. In that table, if you want to add
multiple columns, use the below syntax.
Syntax
1. CREATE TABLE table_name (
2. column1 datatype,
3. column2 datatype,
4. column3 datatype,
5. ....
6. );
The column parameters specify the names of the columns of the table.
The data type parameter specifies the type of data the column can hold (e.g. varchar, integer, date, etc.).
Create Table Example
1. CREATE TABLE Employee(
2. EmpId int,
3. LastName varchar(255),
4. FirstName varchar(255),
5. Address varchar(255),
6. City varchar(255)
7. );
The EmpId column is of type int and will hold an integer.
14
The LastName, FirstName, Address, and City columns are of type varchar and will hold characters and the
maximum length for these fields is 255 characters.

Insert Value in this Table


The INSERT INTO statement is used to insert new records in a table.
It is possible to write the INSERT INTO statement in two ways.
Syntax
The first way specifies both the column names and the values to be inserted.
If you are adding values for all the columns of the table, then no need to specify the column names in the
SQL query. However, make sure that the order of the values is in the same order as the columns in the
table.
1. INSERT INTO table_name (column1, column2, column3, ...)
2. VALUES (value1, value2, value3, ...);
3.
4. '2nd way
5. INSERT INTO table_name
6. VALUES (value1, value2, value3, ...);
Example
Insert value in a 1st way. The column names are used here
1. INSERT INTO Employee (EmpId,LastName,FirstName,ADDRESS,City)
2. VALUES (1, 'XYZ', 'ABC', 'India', 'Mumbai' );
3. INSERT INTO Employee (EmpId,LastName,FirstName,ADDRESS,City)
4. VALUES (2, 'X', 'A', 'India', 'Pune' );
Insert value in a 2nd way.
1. INSERT INTO Employee
2. VALUES (3, 'XYZ', 'ABC', 'India', 'Mumbai' );

Select Statment in SQL


The SELECT statement is used to select data from a database.
The data returned is stored in a result table, called the result-set.
1. SELECT column1, column2, ...
2. FROM table_name;
Here, column1, column2, ... are the field names of the table you want to select from the data. If you want
to select all the fields available in the table, use the following syntax:
1. SELECT * FROM table_name;
If the above query is executed, then all record is displayed.
Example
1. Select EmpId, LastName from Employee;
2.
3. Select * from Employee;

Update Table
The UPDATE statement is used to modify the existing records in a table.
Syntax
1. UPDATE table_name
2. SET column1 = value1, column2 = value2, ...
3. WHERE condition;

Example
1. UPDATE Employee
2. SET FirstName= 'KS', City= 'Pune'
3. WHERE EmpId= 1;
15
If the above query is executed then for EmpId= 1, "Firstname" and "City" column data will be
updated.

Update Multiple Rows

It is the WHERE clause that determines how many records will be updated.
1. UPDATE Employee
2. SET City='Pune'
Delete Statment in SQL
The DELETE statement is used to delete existing records in a table for a particular Record.
Syntax
1. DELETE FROM table_name WHERE condition;
Example
1. DELETE FROM Employee WHERE EmpId=1;
In Employee table EmpId = 1 record gets deleted.

Delete All Records


It is possible to delete all rows in a table without deleting the table. This means that the table structure,
attributes, and indexes will be intact,
1. DELETE FROM table_name;
2.
3. DELETE From Employee ;
When the above query is executed, only table Data gets deleted.

Many things are not here from this unit

UNIT 5: Functional Dependencies & Normalization


Purpose of Normalization
Normalization is the process of structuring and handling the relationship between data to minimize
redundancy in the relational table and avoid the unnecessary anomalies properties from the database like
insertion, update and delete. It helps to divide large database tables into smaller tables and make a
16
relationship between them. It can remove the redundant data and ease to add, manipulate or delete table
fields.
A normalization defines rules for the relational table as to whether it satisfies the normal form. A normal
form is a process that evaluates each relation against defined criteria and removes the multivalued, joins,
functional and trivial dependency from a relation. If any data is updated, deleted or inserted, it does not
cause any problem for database tables and help to improve the relational table' integrity and efficiency.
Objective of Normalization
1. It is used to remove the duplicate data and database anomalies from the relational table.
2. Normalization helps to reduce redundancy and complexity by examining new data types used in the
table.
3. It is helpful to divide the large database table into smaller tables and link them using relationship.
4. It avoids duplicate data or no repeating groups into a table.
5. It reduces the chances for anomalies to occur in a database.
Types of Normalization
First Normal Form (1NF): The table will be in First Normal Form (1NF) if all the attributes of the table
contain only atomic values. We can also say that if a table holds the multivalued data items in attributes or
composite values, the relation cannot be in the first normal form. So, we need to make it first normal form
by making the entries of the table atomic.
Second Normal Form (2NF): A Relation will be in 2NF if it follows the following condition:
1. The table or relation should be in 1NF or First Normal Form.
2. All the non-prime attributes should be fully functionally dependent on the candidate key.
3. The table should not contain any partial dependency.
Third Normal Form (3NF): The table will be in Third Normal Form (3NF) if it follows the given conditions:
1. The table or relation should be in 2NF.
2. It should not contain any transitive dependency. A Transitive Dependency is that any non-prime
attribute determines or depends on the other non-prime attribute.
Fourth Normal Form (4 NF): A relation is said to be Fourth Normal Form (4NF) if it follows the given
conditions:
1. A table must be in BCNF.
2. There should be no multivalued dependency in the table.
3. For example, if the dependency A -> B, for a single value of A, more than one value of B exists.
Then the relation is said to be a multivalued dependency.
Fifth Normal Form (5 NF): A relation is said to be 5NF if it follows the given conditions:
1. The table should be in 4NF.
2. There should not be Join Dependency or further non-loss decomposed.
It is also known as Project Join Normal Form (PJNF).
BCNF--A stronger definition of 3NF is known as Boyce Codd's normal form.
Advantages of Normalization
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.
Disadvantages of Normalization
o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious problems.

Data redundancy occurs in a relational database when two or more rows or columns have the same value
or repetitive value leading to unnecessary utilization of the memory.
Student Table:
17

Stud Registration CourseID StudName Address Course

205 6204 James Los Angeles Economics

205 6247 James Los Angeles Economics

224 6247 Trent Bolt New York Mathematics

230 6204 Ritchie Rich Egypt Computer

230 6208 Ritchie Rich Egypt Accounts


There are two students in the above table, 'James' and 'Ritchie Rich', whose records are repetitive when we
enter a new CourseID. Hence it repeats the studRegistration, StudName and address attributes.
Update Anomalies: The anomaly occurs when duplicate data is updated only in one place and not in all
instances. Hence, it makes our data or table inconsistent state. For example, suppose there is a student
'James' who belongs to Student table. If we want to update the course in the student, we need to update
the same in the course table; otherwise, the data can be inconsistent. And it reflects the changes in a
table with updated values where some of them will not.

What is Functional Dependency?


Functional Dependency (FD) is a constraint that determines the relation of one attribute to another attribute
in a Database Management System (DBMS). Functional Dependency helps to maintain the quality of data
in the database. It plays a vital role to find the difference between good and bad database design.

A functional dependency is denoted by an arrow “→”. The functional dependency of X on Y is represented


by X → Y. Let’s understand Functional Dependency in DBMS with example.

Types of Functional Dependencies in DBMS


There are mainly four types of Functional Dependency in DBMS. Following are the types of Functional
Dependencies in DBMS:
Multivalued Dependency in DBMS
Multivalued dependency occurs in the situation where there are multiple independent multivalued attributes
in a single table. A multivalued dependency is a complete constraint between two sets of attributes in a
relation. It requires that certain tuples be present in a relation. Consider the following Multivalued
Dependency Example to understand.
Trivial Functional Dependency in DBMS
The Trivial dependency is a set of attributes which are called a trivial if the set of attributes are included in
that attribute.
So, X -> Y is a trivial functional dependency if Y is a subset of X. Let’s understand with a Trivial Functional
Dependency Example.
Non-Trivial Functional Dependency in DBMS
Functional dependency which also known as a nontrivial dependency occurs when A->B holds true where
B is not a subset of A. In a relationship, if attribute B is not a subset of attribute A, then it is considered as a
non-trivial dependency.
Transitive Dependency in DBMS
A Transitive Dependency is a type of functional dependency which happens when “t” is indirectly formed by
two functional dependencies. Let’s understand with the following Transitive Dependency Example.

What is Decomposition in DBMS?


The term decomposition refers to the process in which we break down a table in a database into various
elements or parts. Thus, decomposition replaces a given relation with a collection of various smaller
relations. Thus, in a database, we can make any table break down into multiple tables when we want to
collect a particular set of data.
Decomposition must always be lossless. This way, we can rest assured that the data/information that was
there in the original relation can be reconstructed accurately on the basis of the decomposed relations. In
18
case the relation is not decomposed properly, then it may eventually lead to problems such as information
loss.
Types of Decomposition
Decomposition is of two major types in DBMS:
1. Lossless Decomposition
A decomposition is said to be lossless when it is feasible to reconstruct the original relation R using joins
from the decomposed tables. It is the most preferred choice. This way, the information will not be lost from
the relation when we decompose it. A lossless join would eventually result in the original relation that is
very similar.
For example,
Let us take ‘A’ as the Relational Schema, having an instance of ‘a’. Consider that it is decomposed into: A1,
A2, A3, . . . . An; with instance: a1, a2, a3, . . .. an, If a1 ⋈ a2 ⋈ a3 . . . . ⋈ an, then it is known as ‘Lossless
Join Decomposition’. Read more about Lossless Decomposition in DBMS here.
2. Lossy Decomposition
Just like the name suggests, whenever we decompose a relation into multiple relational schemas, then the
loss of data/information is unavoidable whenever we try to retrieve the original relation. Read more
about Lossy Decomposition in DBMS here.
Properties of Decomposition
Decomposition must have the following properties:
1. Decomposition Must be Lossless
2. Dependency Preservation
3. Lack of Data Redundancy

You might also like