File Processing System and Its Disadvantages
File Processing System and Its Disadvantages
File Processing System and Its Disadvantages
A) The file-processing system is supported by a conventional operating system. The system stores
permanent records in various files, and it needs different application programs to extract records
from, and add records to, the appropriate files. Before database management systems (DBMSs) came
along, organizations usually stored information in such systems.
• Data redundancy and inconsistency. Since different programmers create the files and application
programs over a long period, the various files are likely to have different formats and the programs
may be written in several programming languages. Moreover, the same information may be
duplicated in several places (files).
For example, the address and telephone number of a particular customer may appear in a file that
consists of savings-account records and in a file that consists of checking-account records. This
redundancy leads to higher storage and access cost. In addition, it may lead to data inconsistency;
that is, the various copies of the same data may no longer agree.
• Difficulty in accessing data. The conventional file-processing environments do not allow needed
data to be retrieved in a convenient and efficient manner. Because without an appliation program for a
specific information, it is not possible to retrieve it.
• Data isolation. Because data are scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.
• Integrity problems. The data values stored in the database must satisfy certain types of consistency
constraints. For example, the balance of a bank account may never fall below a prescribed amount
(say, $25). It is difficult to change the programs to enforce these constraints when constraints involve
several data items from different files.
• Atomicity problems. A computer system, like any other mechanical or electrical device, is subject to
failure. In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent
state that existed prior to the failure. Consider a program to transfer $50 from account A to account B.
If a system failure occurs during the execution of the program, it is possible that the $50 was removed
from account A but was not credited to account B, resulting in an inconsistent database state. Clearly,
it is essential to database consistency that either both the credit and debit occur, or that neither occur.
That is, the funds transfer must be atomic—it must happen in its entirety or not at all. It is difficult to
ensure atomicity in a conventional file-processing system.
• Concurrent-access anomalies. For the sake of overall performance of the system and faster response,
many systems allow multiple users to update the data simultaneously. In such an environment,
interaction of concurrent updates may result in inconsistent data. Consider bank account A,
containing $500. If two customers withdraw funds (say $50 and $100 respectively) from account A at
about the same time, the result of the concurrent executions may leave the account in an incorrect (or
inconsistent) state. supervision is difficult to provide because data may be accessed by many different
application programs that have not been coordinated previously.
• Security problems. Not every user of the database system should be able to access all the data. For
example, in a banking system, payroll personnel need to see only that part of the database that has
information about the various bank employees. They do not need access to information about
customer accounts. But, since application programs are added to the system in an ad hoc manner,
enforcing such security constraints is difficult.
The relational algebra is a procedural query language. It consists of a set of operations that take one or
two relations as input and produce a new relation as their result.
The fundamental operations in the relational algebra are select, project, union, set difference, Cartesian
product, and rename. In addition to the fundamental operations, there are several other operations—
namely, set intersection, natural join, division, and assignment.
Fundamental Operations
The select, project, and rename operations are called unary operations, because they operate on one
relation. The other three operations operate on pairs of relations and are called binary operations.
The select operation selects tuples that satisfy a given predicate.We use the lowercase Greek letter
sigma (σ) to denote selection. The predicate appears as a subscript to σ.
Example. to select those tuples of the loan relation where the branch is “Perryridge,”
σ amount>1200 (loan)
In general, we allow comparisons using =, _=, <, ≤, >, ≥ in the selection predicate.
Furthermore, we can combine several predicates into a larger predicate by using the connectives and
to find those tuples pertaining to loans of more than $1200 made by the Perryridge branch, we write
The project operation is a unary operation that returns its argument relation, with certain attributes
left out. Since a relation is a set, any duplicate rows are eliminated. Projection is denoted by the
uppercase Greek letter pi (Π).We list those attributes that we wish to appear in the result as a
subscript to Π. The argument relation follows in parentheses.
Thus, we write the query to list all loan numbers and the amount of the loan as
Πloan-number, amount(loan)
“Find those customers who live in HYD.” We write:
Consider a query to find the names of all bank customers who have either an account or a loan or
both. Note that the customer relation does not contain the information, since a customer does not need
to have either an account or a loan at the bank. To answer this query, we need the information in the
depositor relation and in the borrower relation We know how to find the names of all customers with
a loan in the bank:
Πcustomer-name (borrower )
We also know how to find the names of all customers with an account in the bank:
Πcustomer-name (depositor)
To answer the query, we need the union of these two sets; that is, we need all customer names that
appear in either or both of the two relations.We find these data by the binary operation union,
The set-difference operation, denoted by −, allows us to find tuples that are in one relation but are not
in another. The expression r − s produces a relation containing those tuples in r but not in s. We can
find all customers of the bank who have an account but not a loan by
The Cartesian-product operation, denoted by a cross (×), allows us to combine information from any
two relations.We write the Cartesian product of relations r1 and r2 as r1 × r2.
Suppose that we want to find the names of all customers who have a loan at the Perryridge branch.
We need the information in both the loan relation and the borrower relation to do so. If we write
The rename operator, denoted by the lowercase Greek letter rho (ρ), lets us do this. Given a relational-
algebra expression E, the expression ρx (E) returns the result of expression E under the name x.
Additional Operations
The first additional-relational algebra operation that we shall define is set intersection (∩). Suppose
that we wish to find all customers who have both a loan and an account. Using set intersection, we can
write
The natural join of r and s, denoted by r s, is a relation on schema R and S formally defined as
follows: r s = ΠR ∪ S (σ r.A1 =s.A1 ∧ r.A2 =s.A2 ∧ ... ∧ r.An =s.An r × s)
Find the names of all branches with customers who have an account in the bank and who live in HYD.
Πbranch-name(σcustomer-city =“Harrison” (customer account depositor))
The division operation, denoted by ÷, is suited to queries that include the phrase “for all.” Suppose
that we wish to find all customers who have an account at all the branches located in Brooklyn. We
can obtain all branches in Brooklyn by the expression
Extended Relational-AlgebraOperations
Generalized Projection
Aggregate Functions
Aggregate functions take a collection of values and return a single value as a result.
The aggregate function count returns the number of the elements in the collection, Other common
aggregate functions include min and max,which return the minimum and maximum values in a
collection.
Outer Join
The outer-join operation is an extension of the join operation to deal with missing information. We can
use the outer-join operation to avoid this loss of information. There are actually three forms of the
operation: left outer join, denoted ( ); right outer join, denoted( ) ; and full outer join, denoted (
).
The left outer join ( ) takes all tuples in the left relation that did not match with any tuple in the
right relation, pads the tuples with null values for all other attributes from the right relation, and adds
them to the result of the natural join
The right outer join ( ) is symmetric with the left outer join: It pads tuples from the right relation
that did not match any from the left relation with nulls and adds them to the result of the natural join.
The full outer join( ) does both of those operations, padding tuples from the left relation that did
not match any from the right relation, as well as tuples from the right relation that did not match any
from the left relation, and adding them to the result of the join.