Chapter 9 Revision

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

TERMS

 Database (DB) – An integrated collection of stored data that is centrally


managed and controlled.
 Database management system (DBMS) – A system software
component that manages and controls one or more databases.
 Schema – database component that contains descriptive information
about the data stored in the physical data store.
 Structured Query Language (SQL) – A query language used to access
and update the data in a relational database.
 Data administrator (DA) – The person in charge of the structure and
integrity of the data
 Database administrator – The person in charge of the safety and
operation of the DBMS
 Relational database management system (RDBMS) – A DBMS that
organizes data in tables or relations.
 Tables – two-dimensional data structures consisting of columns and rows
 Row, tuple, or record – one horizontal group of data attribute values in a
table
 Column, attribute, field – one vertical group of data attribute values in
a table
 Attribute value, field value, or data element – the value held in a
single table cell
 Key – An attribute or set of attributes, the values of which are unique for
each row of the table, that is used to uniquely identify a row
 Candidate key – An attribute or set of attributes that are unique
identifiers and could serve as the primary key
 Primary key – The key chosen by a database designer to define
relationships by being used as a foreign key in other tables.
 Foreign key – An attribute that duplicates the primary key of a different
(or foreign) table.
 Referential integrity – Every value as a foreign key in one table must
have an equivalent value as the primary key in the associated table.
 Normalization – a formal technique for transforming a relational schema
to an equivalent one that minimizes data redundancy and eliminates
anomalies
 Database synchronization – updating one database copy with changes
made to other database copies.

Explain the responsibilities of the data


administrator and database administrator
The DA manages important aspects of data definition and database design,
including the following:
 Data Standards. Naming standards, definition standards, data typing
standards, and value edits.
 Data use. Ownership of data, accessibility of data, and confidentiality
 Data quality. Validation rules, completeness, currency, consistency, and
relevancy.
The DBA maintains the database after it has been deployed and manages the
safety and operation of the database. It is the DBA’s responsibility to ensure the
database – both the data and the DBMS – is configured correctly for the
organization’s architecture and performs effectively and efficiently. The DBA’s
responsibilities include:
 Managing a multiple DBMS environment
 Protecting the data and database, including user authentication and attack
prevention.
 Monitoring and maintaining high levels of performance
 Backing up the database and defining recovery procedures.

Design a relational database schema based


on a class diagram
Each table in a relational database must have a unique key. A key is an attribute
or set of attributes, the values of which are unique for each row of the table, that
is used to uniquely identify a row.
A key uniquely identifies each row in a table. There may be multiple groups of
attributes that are unique in each row and could serve as a primary key. This is
called a candidate key.
A primary key is chosen from the set of candidate keys to define the
relationships by being used as a foreign key in other tables.
A foreign key is an attribute that duplicates the primary key of a different (or
foreign) table.
For database design, the preferred starting point is the domain model class
diagram.

The steps are as follows:


1. Create a table for each class in the domain model
2. Choose a primary key for each table (invent one, if necessary)

3. Add foreign keys to represent one-to-many associations


Add the primary key attribute(s) of the “one” class to the table that represents
the “many” class.
4. Create new tables to represent many-to-many associations
There are two possible situations:
 Association with defined association class. Add the primary keys of
the endpoint classes (i.e., associated classes) as foreign keys in the
table that was created for the association class. These foreign keys
are always a candidate key and will frequently serve as a primary
key.
 Association without a class. Create a new table to represent
association. Add the primary keys of the endpoint classes as foreign
keys in the new table. These foreign keys are always a candidate
key. The concatenation of these primary keys are always a unique
identifier for the record and thus a candidate key.
5. Represent and adjust classification hierarchies
Generalization / specialization hierarchies are a special case in relational
database design. A table representing a specialised class inherits data and
methods from a table representing a generalized class. Thid inheritance is
represented as follows:
 Combining all the tablets into a single table containing a superset of
all classes
 Using separate tables to represent child classes, and using the
primary key of the parent class as the primary key of the child class
 Some combination of the previous two methods
6. Define referential integrity constraints
For relational databases, the term referential integrity describes a
consistent state between foreign key and primary key values.

A constraint stored in the schema that the DBMS uses to automatically


enforce referential integrity is called a referential integrity constraint

For example, a referential integrity constraint for SaleID in the SaleItem


table would be written in SQL as:
Evaluate and improve the quality of a
database schema
Continuing the steps from the last section:

7. Evaluate schema quality and make necessary improvements


One important characteristic of a correct database design is that all the
tables in the database be normalized. A normalized database has certain
characteristics that are important for accuracy and performance:
1. Allows flexibility in implementing future data model changes
2. Contains a minimum of redundant data
3. Prevents insertion, deletion, and update anomalies

A database schema is considered flexible and maintainable if changes to


the database schema can be made with minimal disruption to existing
data content and structure.

A table is in first normal form (1NF) if every field contains only one
value. The formal way to state this is that all attribute values must be
atomic.

Non-atomic fields can be illustrated in 2 ways:


1. Either you have attributes with multiple values in them, which
are called multivalued attributes

2. You have a table with multiple columns with the same name

1nf prohibits attributes such as ‘Dependent’ shown above. To solve this


problem. The table must be divided into two tables. Each row has a unique
key. The solution is in the figure below which is in first normal form.
Functional dependency is a relationship between columns such that the
values in one (or more) column determines the values in a second column
(or set of columns). The association is formally stated as follows:

The most precise way to determine if functional dependency exists is to


pick two attributes and insert their names as A and B and ask if the
statement is true.

A table is in Second Normal form (2NF) if it is in 1NF and if each non-


key attribute is functionally dependent on the entire primary key. A table
violates 2NF when a non-key attribute is functionally dependent on only
part of the primary key, which is only possible if the primary key contains
multiple attributes.
When a table’s primary key consists of two or more attributes, the analyst
must examine functional dependency of non-key attributes on each
portion of the primary key.
To be in 2NF, each non-key attribute must be functionally dependent on
the entire primary key consisting of the combination.
If a non-key attribute is functionally dependent on only part of the primary
key, then you must remove the non-key attribute from its present table
and place it in another table to satisfy the requirements of 2NF
A table is in third normal form (3NF) if it is in 2NF and if no non-key
attribute is functionally dependent on any other non-key attribute. To
verify that a table is in 3NF, you must check the functional dependency of
each non-key attribute against every other non-key attribute. The number
of functional dependencies to be checked is:

functional dependencies must be checked in both directions.

There are 2 common violations of 3NF.


 An attribute that can be computed by a formula or algorithm using
other table values as input. Common examples include subtotals,
totals, and taxes
 Computational dependencies are a form of redundancy because a
change to the value of any input variable in the computation also
changes the result of the computation. To correct this, remove the
computed attribute from the database.

8. Choose appropriate data types


A data type defines the storage format and allowable content of a
program variable, class attribute, or relational database attribute or
column.
Primitive data types are supported directly by computer hardware and
programming languages and include integers, single characters, and real
numbers.
Complex data types are combinations of or extensions to primitive data
types that are supported by programming languages, operating systems,
and DBMS’s. E.g. arrays, tables
Describe the different methods for
configuring distributed databases
Continuing from the steps:
9. Incorporate integrity and security control

 Decentralized database is a database stored at multiple locations


without needing to be interconnected through a network or
synchronized. Each database may have the same structure and
schema as the other databases but does not need to relate to
internet or wide area networks (WANs). In this situation, the data is
purely local, even though the database configuration may be the
same.
 Homogenous distributed database is a database distributed
across multiple locations with the same DBMS, and all database
access coordinated by a global schema. It is used in the situation
where the data need to be shared or at least available throughout
the reach of the organization.

 Heterogeneous distributed database is a database distributed


across multiple locations with different DBMSs and with local access
allowed without global schema coordination. This configuration
combines the features of the two previous configurations, namely,
that there are some users and queries that are purely local,
combined with other users and queries that require global access.
With a heterogeneous configuration, different locations may also
utilize distinct DBMSs, especially where most of the queries are local
and only occasional queries require global interfaces.
There are four distribution strategies:
1. Data replication
 In data replication setup, complete copies of a database are hosted
at multiple locations or server farms, typically distributed
geographically.
 These copies are synchronized periodically by cooperating DBMSs.
 The system allows applications to redirect access requests to other
servers if one becomes unavailable, often prioritizing the closest
server for faster response times.
 However, replication comes with challenges. When a database
update occurs on one server, other databases can be temporarily
outdated until synchronization occurs.
 This lag between updates and synchronization can lead to clients
receiving outdated information.
 More frequent synchronization improves data accuracy but
increases the demand on system resources.
 The advantages of replicated databases include faster response
times, as each location operates independently
 And higher availability, as other locations can take over if one
server goes down.
 Additionally, transaction processing is simpler since only one server
is impacted by updates.
 However, disadvantages include higher storage requirements due to
multiple database copies and the complexity of keeping all locations
synchronized efficiently.
2. Horizontal partitioning
 Occurs when a database table is divided by rows, with some records
stored at one location and others at different locations.
 A global schema is used to direct queries and updates to the
appropriate data centre
 While this approach simplifies database management and allows
each data centre to operate independently, it becomes more
complex as customers with global accounts may require access to
multiple locations, necessitating data synchronization or
duplication.
 Additionally, combining data from all locations is required for unified
reporting or analysis
 The key advantage is simplicity, with optimized local access and
localized security.
 However, a significant disadvantage is the lack of redundancy,
making the system vulnerable to outages if one location becomes
unavailable.

3. Vertical partitioning
 Involves distributing entire tables or specific columns of a base
table across different locations.
 For example, in a vertically partitioned database, some columns
from a table might be stored at location A, others at location B, and
additional columns at location C.
 While distributing entire tables is straightforward, distributing only
specific columns from a table is more complex and generally more
difficult than horizontal partitioning.
 Vertical partitioning is typically used when different locations or
functional areas require access to distinct sets of data from the
same table.
 The complexity of vertical partitioning is balanced by its usefulness
in situations where functional areas need different data elements
from the same table.
 However, this approach requires careful management to ensure that
the split data can be efficiently accessed and maintained across
distinct locations
4. Combinations of the above three strategies
 This combination can be used to ensure that data is available in the
right place, in the right form, for large, geographically distributed
organizations.
 Vertical partitioning is used when different functional areas in
separate locations require specific data
 Horizontal portioning is used for functional areas, like Sales, that
need access to customer and sales data across all locations.
 Replicated tables are used for corporate-wide data that must be
available at every location.
 This complex configuration is overseen by data administrators and
database administrators, as it requires careful design and
maintenance.

Explain the importance of and methods for


protecting the database
Transaction logging is a method used to record all updates made to a
database, including details such as user ID, date, time, input data, and the type
of update
This log is stored separately from the database and serves as an audit trail,
allowing independent checks of any database changes. Most DBMSs include
transaction logging, though it can be customized by database designers or
administrators.
The primary objectives of transaction logging are two-fold:
 It discourages fraudulent or malicious activity, as users know their actions
are being logged and can be traced back to them.
 It provides a recovery mechanism for correcting errors by allowing
erroneous transactions to be undone or “unapplied”. More advanced
logging systems also capture “before” and “after” images of data
changes, offering an additional layer of control for sensitive or critical data
files, which can be crucial for maintaining data integrity.
Concurrency and complex update controls.
Databases are central to nearly all electronic activities, from online shopping to
accessing social media. These large, active databases must be fast and designed
to allow multiple users to read and update data without causing issues, such as
inadvertently overwriting each other’s updates. For example, when several users
attempt to by the same product online, updates to inventory must be managed
efficiently to prevent the “lost update problem”, where one user’s changes
overwrite another’s.
Two key concepts help prevent such problems:
 Transactions: A transaction involves multiple steps, for example, buying
a movie ticket online:
o Read the database for available seats and display it
o Update the database when the user chooses a seat
o Start a timer
o Accept user purchase information (credit card number)
o Verify credit and send a charge to the user’s credit card
o Post the user’s payment to the company ledger
o Permanently update the seat database that the chosen seats are
now taken
o Update the user information with a reservation for the specific
seats.
If any step fails (e.g. due to credit card issues), all previous steps are
undone to restore the database to its original state.
 Database locks: Locking a portion of the database prevents other users
from accessing or modifying the same data until the first user completes
their operation. Locks can be applied to different levels, such as columns,
rows, or entire tables. Smaller locks allow for more efficiency in multi-user
environments, while larger locks are simpler to manage but reduce overall
efficiency. There are two types of locks:
o Shared (read) lock: Allows multiple users to read the same data
but prevents updates.
o Exclusive (write) lock: Prevents other users from reading or
updating the data while it is locked by a user who is updating it.
By combining transactions and locks, databases ensure data integrity and
prevent conflicts in muti-user environments.

You might also like