SS2 TERM 1
SS2 TERM 1
SS2 TERM 1
The term data model can be used in two related senses, as follows,
It is a description of the objects represented by a computer system together with their
properties and relationship which are typically real world objects, such as products,
supplies, customer, order etc.
It means a collection of concepts and rules used in defining data models eg. The
relational model uses relations and tuples while the network model uses records sets and
fields.
Definition:-A data model is a conceptual representation of the data structure that is required by a
database.
The data structure includes the data objects the associations between data objects and the rules
that govern operations on the objects. As the name implies, the data model focuses on what data
is required and how it should be organized rather than what operations will be performed on the
data.
A data model is independent of hardware or software constraints. Rather than representing the
data as a database would see it the data model focuses on representing the data as the user sees it
in the real world. The data model serves as a bridge between the concepts that make up real-
world events and processes and the physical representation of those concepts in a database. A
common analogy is that a data model is equivalent to an architect’s building plans.
Data models are often used as an aid of communication between the business people
define the requirements for a computer system and the technical people defining the design in
response to those requirements. They are used to show the data needed and created by business
processes.
A data model can be thought of as a diagram or flowchart that illustrates the relationships
between data. Although capturing all the possible relationships in a data model can be very time-
intensive, it is an important step and should not be rushed. Well documented models allow stake-
holders to identify errors and make changes before any programming code has been written.
Data modellers often use multiple models to view the same data and ensure that all
processes, entities, relationship and data flow have been identified. Data model is based on data,
data relationship, data semantic and data constraint. It provides the details of information to be
stored and is of primary use where the final product is the generation of computer software code
for an application or the preparation of a functional specification to aid a computer software
make-or- long decision..
There are three different types of data models produced while progressing from requirements to
the actual database to be used for information system. They are,
Conceptual data model:- the data requirements are initially recorded as a conceptual data
model which is a set of technology independent specifications about the data and is used to
discuss initial requirement with the business stakeholders.
Conceptual Data Model
A conceptual data model identifies the highest-level relationships between the different entities.
Features of conceptual data model include:
Includes the important entities and the relationships among them.
No attribute is specified.
No primary key is specified.
The figure below is an example of a conceptual data model.
1
Conceptual Data Model
From the figure above, we can see that the only information shown via the conceptual data
model is the entities that describe the data and the relationships between those entities. No other
information is shown through the conceptual data model.
Logical data model:- The conceptual data model is here translated into logical data model. This
is where the documentation of the data that can be implemented in the database takes place.
Implementation of one conceptual data model may require multiple logical data models.
Logical data model describes the data in as much detail as possible, without regard to how they
will be physical implemented in the database. Features of a logical data model include:
Includes all entities and relationships among them.
All attributes for each entity are specified.
The primary key for each entity is specified.
Foreign keys (keys identifying the relationship between different entities) are specified.
Normalization occurs at this level.
The steps for designing the logical data model are as follows:
Specify primary keys for all entities.
Find the relationships between different entities.
Find all attributes for each entity.
Resolve many-to-many relationships.
Normalization.
The figure below is an example of a logical data model.
Logical Data Model
2
Comparing the logical data model shown above with the conceptual data model diagram, we see
the main differences between the two:
In a logical data model, primary keys are present, whereas in a conceptual data model, no
primary key is present.
In a logical data model, all attributes are specified within an entity. No attributes are specified in
a conceptual data model.
Relationships between entities are specified using primary keys and foreign keys in a logical
data model. In a conceptual data model, the relationships are simply stated, not specified, so we
simply know that two entities are related, but we do not specify what attributes are used for this
relationship.
Physical data model:- The last step is to transform the logical data model to physical data
model. Here the data is organized into tables, and account for access, performance and storage
details.
Physical data model represents how the model will be built in the database. A physical database
model shows all table structures, including column name, column data type, column constraints,
primary key, foreign key, and relationships between tables. Features of a physical data model
include:
Specification of all tables and columns.
Foreign keys are used to identify relationships between tables.
Denoralization may occur based on user requirements.
Physical considerations may cause the physical data model to be quite different from the logical
data model.
Physical data model will be different for different RDBMS. For example, data type for a column
may be different between MySQL and SQL Server.
The steps for physical data model design are as follows:
Convert entities into tables.
Convert relationships into foreign keys.
Convert attributes into columns.
Modify the physical data model based on physical constraints / requirements.
The figure below is an example of a physical data model.
Physical Data Model
3
Comparing the logical data model shown above with the logical data model diagram, we see the
main differences between the two:
Entity names are now table names.
Attributes are now column names.
Data type for each column is specified.
Data types can be different depending on the actual database being used.
Academic Finacial
reports reports
Hierarchical
4model
Network model:-this model is similar to the hierarchical model but it is
composed for each child having more than one parent. The model offers many-to-
many relationships as against hierarchical which is one-to-many.
Network model
Relational model:- This is the most common of all the models. It represents all
the data in the database in two-dimensional tables. Each row in the table is called
a tuple while each column represents an attribute. The tuple represents a record
and the attribute represents a field.
Object-oriented model:- This is similar to a relational database model, but
objects classes and inheritance are directly supported in database schemes and in
the query language.
Star schema:-This is the simplest style of data warehouse schema. The star
schema consists of a few “fact tables” (possibly only one justifying the name)
referencing any number of “dimension tables”.
DATA STRUCTURE DIAGRAM:- A data structure diagram (DSD) is a diagram and
data model used to describe conceptual data models by providing graphical notations
which document entities and their relationships and the constraint that bind them. The
basic graphic elements of DSDs are boxes, representing entities and arrows representing
relationships. DSDs are most useful for documenting complex entities. DSDs are
extension of ER-model. In DSDs attributes are specified inside the entity boxes rather
than outside of them, while relationships are drawn as boxes compose of attributes which
specify the contracts that bind entities together. ER-model does not provide a way to
specify the constraints between relationships and therefore cumbersome when
representing entities with several attributes. ER-model focuses on the relationships
between different entities, while DSDs focuses on the relationships of the elements
within the entity and enable users to fully see the links and relationships between each
entity.
ENTITY-RELATIONSHIP MODEL:-This is a model that allows us to describe the
data involved in a real world enterprise in terms of objects and their relationships and it
is widely used to develop an initial database design. It provides useful concepts that
allow us to move from an informal description of what users want for their database to a
more detailed and precise description that can be implemented in a DBMS. This model is
an abstract conceptual data model or semantic data model used in software engineering
to represent structured data.
GEOGRAPHIC DATA MODEL:- This is a data model in geographic information
system which is a mathematical construct for representing geographic objects or surfaces
as data. Examples of this model are,
The vector Data model represents geography as collection of points,
lines, and polygons.
The Raster Data model represents geography as cell matrixes that stores
numeric values.
Triangulated irregular Network(TIN) Data model represents geography as
sets of contiguous, non-overlapping triangles.
GENERIC DATA MODEL:-These are generalization of conventional data models.
They define standardised general relation types. They are developed as an approach to
5
solve some shortcomings of the conventional data models. For example a modular
usually produce different conventional data models of the same domain. This can lead to
difficulty in bringing the models of different people together and is an obstacle for data
exchange and data integration. Invariably, however, this difference is attributable to
different level of abstraction in the models and differences in the kinds of facts that can
be instantiated ( the semantic expression capabilities of the model). The modeller need to
communicate and agree on certain elements which are to be rendered more concrete in
order to make the differences less significant.
SEMANTIC DATA MODEL:- This is a technique in software engineering to define the
meaning of data within the context of its interrelationships with other data. A semantic
data model is an abstraction which defines how the stored symbols relate to the real
world. It is sometimes called a conceptual data model.
DATA MODELING
Data modelling is the analysis of data objects that are used in a business or other context and the
identification of the relationships among these data objects. Data modelling is a first step in
doing object-oriented programming.
Data modelling is the formalization and documentation of existing processes and events that
occur during application software design and development.
Identify the keys for each entity:- Keys are used to look up arrow of data. Keys describe the
minimum amount of data necessary to identify a particular thing. Often a computer generated
number is used for the key. For example, an employee ID number.
Identify the attribute for each entity:- Attribute are the information which is needed to be kept
about an entity. Example, an attribute about a worker may include the work’s name and e-mail
address.
Note Data models are expressed in an ERD and data element dictionary (DED). The ERD is a
schematic representation of the database while DED is the text representation. The two must be
combined to get a clear picture of the data model.
6
DATA MODELLING APPROACHES
The following are the data modelling approaches
Semantic modelling
Relational modelling
Entity-Relationship modelling
Binary modelling
Semantic modelling:- is the modelling that uses the concept of type. A type is the
collection of certain number of properties into a unit or component. The properties
are also considered as type.
Example: Student registering for a course.
type registration =student, course
type student =name, student ID, street address, city, state, zip code.
Type course =course name, course number, day-of-week time
Registration
Student Course
Note:
Aggregation is represented in the model diagram by placing it above the
properties.
Base types such as name, student ID, address and so on are not represented in the
graphical notation.
Relation modelling :-this uses the concept of mathematical relation which forms the
basis for the data structure in the relational model. A relation is visualized as a two-
dimensional table with rows and columns containing only atomic values. The
example of student registration is defined in the relational modelling as follows.
Registration (student ID, course number)
Student (name, student ID, street address, city, state, zip code)
Course (course name, course number, day-of-week, time)
Relation registration has two attributes forming two table columns. The number of rows in each
table depends on the actual data stored. Each row is uniquely identified by values of the columns
in bold type.
7
Entity-relationship modelling:- This uses the concept of entity type, attribute type
and relationship type to form a complete ER-model as shown below, using the
example of student registering for course,
name Student ID
Zip code Course number
St. address
state
Course
Binary modelling:- This separates the object types into lexical and non-lexical
object types. Lexical object types are those that can be used as names for other
object types or for references to other object types. Non-lexical object types are
named object types or those referred to by other object types. The relationship
between a lexical and non-lexical object type is called bridge type. Relationship
between two non-lexical object types is called idea type. Graphical constraints may
be imposed on an information structure diagram of the binary model. Uniqueness
constraint and totality constraint are among the imposable constraints. In a binary
model, it must always be possible to refer uniquely to a non-lexical object type that
is each binary model must be referable. The information structure diagram for the
student registration example becomes more complex than using the other modelling
approaches.
Note:-The dotted circles denote lexical object types and closed circles representing non-lexical
object types. The bridge type indicate the relationship between student and course. Uniqueness
constraints are indicated by “u” and total constraints by “v”. The information structure diagram
is more complex than using the other modelling approaches.
8
The diagram below illustrate binary modelling,
v
v
v v
v
v
v
u
v v
v
9
MODELLING METHODOLOGIES
Data models represent information areas of interest. There are several ways of creating data
model, but these two methodologies stand out.
Bottom-up methodology
Top-down methodology
Bottom-up methodology:-These are methodologies that usually start with existing data structure
forms, field on application screens, or reports.
Top-down methodology:- This are methodologies that are created in an abstract way by getting
information from people who know the subject area.
Note:- The most common method for building data models for relational database is entity-
relationship model.
The data model is one part of the conceptual design process. The other, typically is the
functional model. The data model focuses on what data should be stored in the database
while the functional model deals with how the data is processed. To put this in the
context of the relational database, the data model is used to design the relational tables.
The functional model is used to design the queries which will access and perform
operations on those tables.
10
A data model instance, i.e. applying a data model theory to create a practical
data model instance for some particular applications.
When in use, they enable easier and faster information sharing because heterogeneous
organizations have a standard vocabulary and pre-negotiated semantics, format, and quality
standards for exchanged data. The standardization has an impact on software architecture as
solutions that vary from the standard may cause data sharing issues and problems if data is out of
compliance with the standard.
The more effective standard models have developed in the banking, insurance, pharmaceutical
and automotive industries, to reflect the stringent standards applied to customer information
gathering, customer privacy, consumer safety, or just in time manufacturing.
Typically these use the popular relational model of database management, but some use the
hierarchical model, especially those used in manufacturing or mandated by governments, e.g.,
the DIN codes specified by Germany. While the format of the standard may have
implementation trade-offs, the underlying goal of these standards is to make sharing of data
easier.
The most complex data models known are in military use, and consortia such as NATO tend to
require strict standards of their members' equipment and supply databases. However, they
typically do not share these with non-NATO competitors, and so calling these 'standard' in the
same sense as commercial software is probably not very appropriate.
An emerging area of standard data model is in the identity card arena, where a vast number of
security engineering solutions for public spaces, e.g., airports, other public transport, hospitals,
are expected soon to rely on a standard data model for identifying the card holder/user of the
facility. This may contain biometric information or other data that would be standardized across
an entire trade bloc, e.g., the European Union or the North American Free Trade Agreement
(NAFTA). This raises many privacy and carceral state concerns. These are discussed more
deeply in an article on standard user models.
11
NORMALIZATION
Normalization is a process in which an initial DB design is transformed, or decomposed, into a
different, but equivalent, design. The resulting schema is equivalent to the original one in the
sense that no information is lost when going from one to the other.
The normalization procedure consists of a sequence of projections that is; some attributes are
extracted from one table to form a new one. In other words, tables are split up vertically. The
decomposition is lossless, only if you can restore the original table by joining its projections.
Through such non-loss decompositions it is possible to transform an original schema into a
resulting one that satisfies certain conditions, known as Normal Forms:
Tables that contain redundant data can suffer from update anomalies, which can introduce inconsistencies into a
database.
The rules associated with the most commonly used normal forms, namely first (1NF), second (2NF), and third (3NF).
The identification of various types of update anomalies such as insertion, deletion, and modification anomalies can be
found when tables that break the rules of 1NF, 2NF, and 3NF and they are likely to contain redundant data and suffer
from update anomalies.
Normalization is a technique for producing a set of tables with desirable properties that support the requirements of
a user or company.
Major aim of relational database design is to group columns into tables to minimize data redundancy and reduce file
storage space required by base tables.
ƒ
The First Normal Form (1NF) addresses the structure of an isolated table.
The Second (2NF), Third (3NF), and Boyce-Codd (BCNF) Normal Forms address
one-to-one and one-to-many relationships.
The Fourth (4NF) and Fifth (5NF) Normal Forms deal with many-to-many
relationships.
These Normal Forms form a hierarchy in such a way that a schema in a higher normal form
automatically fulfils all the criteria for all of the lower Normal Forms.
The Fifth Normal Form is the ultimate normal form with respect to projections and joins -- it is
guaranteed to be free of anomalies that can be eliminated by taking projections.
Functional dependency
The Second and Third Normal Forms address dependencies among attributes, specifically
between key and non-key fields.
By definition, a key uniquely determines a record: Knowing the key determines the values of all
the other attributes in the table row, so that given a key, the values of all the other attributes in
the row are fixed.
This kind of relationship can be formalized as follows. Let X and Y be attributes (or sets of
attributes) of a given relationship. Then Y is functionally dependent on X if, whenever two
records agree on their X-values, they must also agree on their Y-values. In this case, X is called
the determinant and Y is called the dependent. Since for any X there must be a singleY, this
relationship represents a single-valued functional dependency. If the set of attributes in the
determinant is the smallest possible (in the sense that after dropping one or more of the attributes
from X, the remaining set of attributes does no longer uniquely determine Y), then the
dependency is called irreducible.
Note that functional dependency is a semantic relationship: It is the business logic of the
problem domain, represented by the relation, which determines whether a certain X determines
Y.
Note:- Every field in a record must depend on The Key (1NF), the Whole Key (2NF), and
Nothing But The Key (3NF).
Delete the columns you just moved from the original table except for the determinate
which will serve as a foreign key.
The original table may be renamed to maintain semantic meaning.
14
Logical Database design(It will be easy to comprehend and for further development)
Each attribute (column) must be a fact about the key, the whole key, and nothing but the key
Keys in tables
Primary key
The primary key is an attribute or a set of attributes that uniquely identify a specific instance of an
entity. Every entity in the data model must have a primary key whose values uniquely identify instances
of the entity.
To qualify as a primary key for an entity, an attribute must have the following properties:
* It must have a non-null value for each instance of the entity
* The value must be unique for each instance of an entity
* The values must not change or become null during the life of each entity instance
Composite Keys
Sometimes it requires more than one attribute to uniquely identify an entity. A primary key that
made up of more than one attribute is known as a composite key.
-
Artificial Keys
An artificial key is one that has no meaning to the business or organization. Artificial keys are
permitted when
No attribute has all the primary key properties, or
The primary key is large and complex.
Foreign Keys
A foreign key is an attribute that completes a relationship by identifying the parent entity.
Foreign keys provide a method for maintaining integrity in the data (called referential integrity)
and for navigating between different instances of an entity. Every relationship in the model must
be supported by a foreign key.
15
This property simplifies data access because developers and users can be certain of the type of
data contained in a given column. It also simplifies data validation. Because all values are from
the same domain, the domain can be defined and enforced with the Data Definition Language
(DDL) of the database software.
DATA INTEGRITY
Data integrity means, in part, that you can correctly and consistently navigate and manipulate the
tables in the database. There are two basic rules to ensure data integrity; entity integrity and
referential integrity.
The entity integrity rule states that the value of the primary key, can never be a null value (a null
value is one that has no value and is not the same as a blank). Because a primary key is used to
identify a unique row in a relational table, its value must always be specified and should never
be unknown. The integrity rule requires that insert, update, and delete operations maintain the
uniqueness and existence of all primary keys.
The referential integrity rule states that if a relational table has a foreign key, then every value of
the foreign key must either be null or match the values in the relational table in which that
foreign key is a primary key.
16
CREATING TABLES IN NORMAL FORMS
Example: The following are the paper forms used for recording data about EMPLOYEES and
their QUALIFICATIONS in a certain company.
Employee Number Employee Name
1
01267 Clark
Department number Department Name Department Location
05 Auditing HQ
Qualification Year
Bachelor of art 1970
Master of art 1973
Doctor of Philosophy 1976
Question: Create a table for the form and normalize the table.
17
Solutions:
Step 1: Create a table for the form
EMPLOYEE TABLE
Employee Employee Dept. Dept. Dept.
Number Name number Name Location
01267 Clark 05 Auditing HQ
Qualification Table
Employee Qualification Qualification
Number Description Year
01267 BA Art 1970
18
Note:- In the tables above we have remove the “repeating group” of qualification data
(consisting of qualification descriptions and year) to it own table. We hold employee number in
the second table to serve as a cross-reference top the first table, because we need to know to
whom each of the qualification belongs. With the Qualification table there is no limit on the
qualification that any given employee is ready to have.
Insertion anomalies
1. To insert the details of a new member of staff (staffNo, name, position and salary) located at
a given branch into the StaffBranch table, we must also enter the correct details for that branch
(branchNo, branchAddress and telNo). For example, to insert the details of a new member of
staff at branch B002, we must enter the correct details of branch B002 so that the branch details
are consistent with values for branch B002 in other records of the StaffBranch table. The data
shown in the StaffBranch table is also shown in the Staff and Branch tables. These tables do
have redundant data and do not suffer from this potential inconsistency, because for each staff
member we only enter the appropriate branch number into the Staff table. In addition, the details
of branch B002 are recorded only once in the database as a single record in the Branch table.
2. To insert details of a new branch that currently has no members of staff into the StaffBranch
table, it’s necessary to enter NULLs into the staff-related columns, such as staffNo. However, as
staffNo is the primary key for the StaffBranch table, attempting to enter nulls for staffNo
19
violates entity integrity, and is not allowed. The design of the tables shown in Staff and Branch
avoids this problem because new branch details are entered into the Branch table separately from
the staff details. The details of staff ultimately located at a new branch can be entered into the
Staff table at a later date.
Deletion anomalies
If we delete a record from the StaffBranch table that represents the last member of staff located
at a branch, the details about that branch are also lost from the database. For example, if we
delete the record for staff Art Peters (S0415) from the StaffBranch table, the details relating to
branch B003 are lost from the database. The design of the tables that separate the Staff and
Branch table avoids this problem because branch records are stored separately from staff records
and only the column branchNo relates the two tables. If we delete the record for staff Art Peters
(S0415) from the Staff table, the details on branch B003 in the Branch table remain unaffected.
Modification anomalies
If we want to change the value of one of the columns of a particular branch in the StaffBranch
table, for example the telephone number for branch B001, we must update the records of all staff
located at that branch (row 1 and 2). If this modification is not carried out on all the appropriate
records of the StaffBranch table, the database will become inconsistent. In this example, branch
B001 would have different telephone numbers in different staff records.
The above examples illustrate that the Staff and Branch tables have more desirable properties
than the StaffBranch table.
STAFF TABLE
20
BRANCH TABLE
StaffBranch table has redundant data; the details of a branch are repeated for every
memberof staff for example row 1 and 2, row 3 and 4 on branchNo, branchAddress and
telNo.
In contrast, the branch information appears only once for each branch in the Branch table
an only the branch number (branchNo) is repeated in the Staff table, to represent where
each member of staff is located.
21
The Second normal form (2NF)
2NF ONLY applies to tables with composite primary keys (more than one primary key).
A table that is in 1NF and in which the values of each non-primary-key column can be
worked out from the values in ALL the columns that make up the primary key.
A table is in 2NF if each non-key (non primary and/or candidate keys) column depends
on ALL candidate keys, NOT on a subset of ANY candidate key.
The 2NF violation occurs when Functional Dependency (FD) in which part of key
(instead of the whole keys) determines a non-key. An FD containing a single column
Left Hand Side (LHS) cannot violate 2NF.
For example, TempStaffAllocation table in the following Figure is in 2NF because
branchAddress can depend on branchNo only not both of staffNo AND branchNo (staffNo &
branchNo are candidate keys and at the same time can be primary keys and at the same time is
composite key because more than one primary keys). Another one is the values in name and
position columns can depend on (can stand on it own) the staffNo ONLY not both of staffNo
and branchNo. What we want is the hoursperWeek column, which depends on both staffNo and
branchNo. In another word we must avoid the partial dependencies on the candidate keys.
22
Third normal form (3NF)
A table that is in 1NF and 2NF and in which all non-primary-key column can be worked
out from only the primary key column(s) and no other columns.
At this level, the combined definition of 2NF and 3NF is a table is in 3NF if each non-
key column depends on all candidate keys, whole candidate keys and nothing but
candidate keys.
For 2NF we should remove partial dependency and for 3NF we should remove transitive
dependency.
ƒ For example the StaffBranch table is not in 3NF.
23
The formal definition of 3NF is a table that is in 1NF and 2NF and in which no
non-primary-key column is transitively dependent on the primary key.
For example, consider a table with A, B, and C. If B is functional dependent on A
(A → B) and C is functional dependent on B (B → C), then C is transitively
dependent on A via B (provided that A is not functionally dependent on B or C).
If a transitive dependency exists on the primary key, the table is not in 3NF.
24
BASIC CONSTRUCT OF ER-MODEL
The ER-model views the real world as a construct of entities and association between entities.
Name Phone No
Cust ID
Customer
Contact Add
Attributes are shown as oval containing the name of the attribute as in the diagram above.
buys
Named relationships are used to make the ERDs more readable. Relationship names do not
showup in the final database like the entity name.
Cardinality:-The cardinality of a relationship constraint the number of one entity type that
can be associated with a single instance of the other entity type.
Types of cardinality
There are three fundamental types of cardinality in ERDs. They are,
One-to-one
One-to-many
Many-to-many
Example of ER-Diagram
Unit price
Product ID Product
Supplied Cardinality
by
Relationship
Phone No
Name Attribute
Supplier
Address
Cust ID Product ID
Name
Associative Entities
Associative entities:-This is also known as intersection entities. They are used to associate two
or more entities in order to reconcile a many-to-many relationship.
To transform a relationship into an entity on ERD we use a special symbol called associative
entity. The notation for an associative entity is a relationship diamond nested inside of an
entity rectangle.
To transform a many-to-many relationship without attributes into an associative entity with
attribute. From the example above we have the following transformation,
Name
Cust ID
Cust ID places
Order
From the diagram above transform the ‘‘contain’’ relationship into an associative entity using
the procedure describe before.
Describe between Logical and physical data model
Logical data model captures general information about entities and
28
APPLICATION PACKAGE –MICROSOFT ACCES
Access is a powerful database management program that can be used for storing,
organizing, retrieving and reporting information. For example, it can be used to manage financial
records personnel data, inventory information, and personal and professional contacts.
A database is any collection of information that is organized for quick retrieval. Example banks
use database to manage costumers account, schools use database to maintain students records
businesses use database for payroll, sales and financial records.
Most databases are computer based,. Documents such as telephone book can be
considered as a manual database because it allows for quick retrieval of information and it is
organized alphabetically.
Computer databases are both flexible and fast. They can save a lot of time compared to manual
databases.
LOADING ACCESS
Click start button
Click All program
Click Microsoft Office
Click Microsoft Office Access 2007.
Note:- A template is a predefined database consisting of a set of objects such as tables and forms
that can be customized with ones own data.
The open Recent Database section lists the names of database you have worked with recently.
The more links consist of database that is not in this list which can be open by clicking the link.
Open command on the office menu can also be used to open a database. The navigation pane
displays all the objects in the database, objects such as tables, forms, queries, reports macros and
modules.
Note:- the objects displayed in the navigation pane can be narrowed down to just one type
29
DATA MODELING
Data modelling is the analysis of data objects that are used in a business or other context and the
identification of the relationships among these data objects. Data modelling is a first step in doing object-
oriented programming.
Data modelling is the formalization and documentation of existing processes and events that occur during
application software design and development. Data modelling techniques and tools capture and translate
complex system designs into easily understood representations of the data flows and processes, creating a
blueprint for construction and/or re-engineering.
A data model can be thought of as a diagram or flowchart that illustrates the relationships between data.
Although capturing all the possible relationships in a data model can be very time-intensive, it's an
important step and shouldn't be rushed. Well-documented models allow stake-holders to identify errors
and make changes before any programming code has been written.
Data modellers often use multiple models to view the same data and ensure that all processes, entities,
relationships and data flows have been identified. There are several different approaches to data
modelling, including:
Conceptual Data Modelling - identifies the highest-level relationships between different entities.
Enterprise Data Modelling – similar to conceptual data modelling, but addresses the unique requirements
of a specific business.
Logical Data Modelling - illustrates the specific entities, attributes and relationships involved in a
business function. Serves as the basis for the creation of the physical data model.
Data Integrity
Data integrity refers to the validity of data, meaning data is consistent and correct. In the data
warehousing field, we frequently hear the term, "Garbage In, Garbage Out." If there is no data
integrity in the data warehouse, any resulting report and analysis will not be useful.
In a data warehouse or a data mart, there are three areas of where data integrity needs to be
enforced:
Database level
We can enforce data integrity at the database level. Common ways of enforcing data integrity
include:
Referential integrity
The relationship between the primary key of one table and the foreign key of another table must
always be maintained. For example, a primary key cannot be deleted if there is still a foreign key
that refers to this primary key.
Primary key / Unique constraint
Primary keys and the UNIQUE constraint are used to make sure every row in a table can be
uniquely identified.
Not NULL vs. NULL-able
For columns identified as NOT NULL, they may not have a NULL value.
Valid Values
30
Only allowed values are permitted in the database. For example, if a column can only have
positive integers, a value of '-1' cannot be allowed.
ETL process
For each step of the ETL process, data integrity checks should be put in place to ensure that
source data is the same as the data in the destination. Most common checks include record
counts or record sums.
Access level
We need to ensure that data is not altered by any unauthorized means either during the ETL
process or in the data warehouse. To do this, there needs to be safeguards against unauthorized
access to data (including physical access to the servers), as well as logging of all data access
history. Data integrity can only ensured if there is no unauthorized access to the data.
Data Integrity
Data integrity means, in part, that you can correctly and consistently navigate and manipulate the
tables in the database. There are two basic rules to ensure data integrity; entity integrity and
referential integrity.
The entity integrity rule states that the value of the primary key, can never be a null value (a null
value is one that has no value and is not the same as a blank). Because a primary key is used to
identify a unique row in a relational table, its value must always be specified and should never
be unknown. The integrity rule requires that insert, update, and delete operations maintain the
uniqueness and existence of all primary keys.
The referential integrity rule states that if a relational table has a foreign key, then every value of
the foreign key must either be null or match the values in the relational table in which that
foreign key is a primary key.
31