Chapter-1: Introduction
Chapter-1: Introduction
Chapter-1: Introduction
1
INTRODUCTION
This document describes the procedure required to migrate a database from Windows to Linux
using the RMAN Convert Database command. Both Windows and Linux platforms have the
same endian format, which makes possible to transfer the whole database, making the migration
process very straightforward and simple. To migrate between platforms that have a different
endian format, Cross Platform Transportable Tablespaces (XTTS)needs to be used instead.
For IT managers, database migration has become one of the most routine and challenging facts
of life. Workload balancing, technology refresh, server and storage consolidation, data center
relocation, data classification, and mergers/acquisitions are all drivers to move data from one
storage device to another on a regular basis. You’d think that any operation performed routinely
would become easy, but it is not that simple which it seems to be. Migration moving data from
one device to another (especially as the capacity of storage devices continues to grow) and then
redirecting all I/O to the new device is an inherently disruptive process. With the increase in the
percentage of mission-critical data and the proportionate increase in data availability demands,
downtime with its huge impact on a company’s financial bottom line becomes unacceptable. In
addition, business, technical and operational requirements impose challenging restrictions on the
migration process itself. Resource demands staff, CPU cycles, and bandwidth and risks
application downtime, performance impact to production environments, technical
incompatibilities, and data corruption/loss make migration one of IT’s biggest challenges.
2
1.1 DATA
The term data means groups of information that represent the qualitative or quantitative
attributes of a variable or set of variables. Data (plural of "datum", which is seldom used) are
typically the results of measurements and can be the basis of graphs, images, or observations of a
set of variables. Data are often viewed as the lowest level of abstraction from which information
and knowledge are derived. Raw data refers to a collection of numbers, characters, images or
other outputs from devices that collect information to convert physical quantities into symbols
that are unprocessed. Raw data refers to a collection of numbers, characters, images or other
outputs from devices to convert physical quantities into symbols, that are unprocessed. Such data
is typically further processed by a human or input into a computer, stored and processed there, or
transmitted (output) to another human or computer (possibly through a data cable). Raw data is a
relative term; data processing commonly occurs by stages, and the "processed data" from one
stage may be considered the "raw data" of the next. The terms information and knowledge are
frequently used for overlapping concepts. The main difference is in the level of abstraction being
considered. Data is the lowest level of abstraction, information is the next level, and finally,
knowledge is the highest level among all three.
1.2 INFORMATION
3
CHAPTER-2: INTRODUCTION TO
DATABASE
4
INTRODUCTION TO DATABASE
DATABASE
A database consists of an organized collection of data for one or more multiple uses. Databases
consist of software-based "containers" that are structured to collect and store information so
users can retrieve, add, update or remove such information in an automatic fashion. They do so
through the use of tables .Database programs are designed for users so that they can add or delete
any information needed. The structure of a database is tabular, consisting of rows and columns of
information.
APPLICATIONS
Databases function in many applications, spanning virtually the entire range of computer
software. Databases have become the preferred method of storage for large multiuser
applications, where coordination between many users is needed. Even individual users find them
convenient, and many electronic mail programs and personal organizers are based on standard
database technology. Software database drivers are available for most database platforms so that
application software can use a common API to retrieve the information stored in a database.
Commonly used database APIs include JDBC and ODBC.
Within new media, databases collect items on which the user can carry out various operations
such as viewing, navigating, creates, and searching. Though there are various types of items
within the database, each item has the same significance. Unlike a narrative or film, the
collections are computerized and therefore may offer a unique experience with each view. This
form of data may present a unique presentation of what the world is like. Databases can be seen
as a symbolic form of the computer age.
5
2.1 COMPONENTS OF DATABASE
A database is developed with several components. All these components are linked to each other.
Each has its own role to play in the developing and working of database. Some of the main
components which are used in developing and working of database are:
COLUMNS
ROWS
TABLES
KEYS
FORMS
REPORTS
QUERY
Table, rows and columns form the building blocks of a database. They store the data that we
want to save in our database
2.1.1 COLUMNS
Columns are akin to fields, that is, individual items of data that we wish to store. A customer's
name, the price of a part, the date of an invoice is all examples of columns. They are also similar
to the columns found in spreadsheets (the A, B, C etc along the top).
2.1.2 ROWS
Rows are akin to records as they contain data of multiple columns (like the 1,2,3 etc in a
spreadsheet). Unlike file records though, it is possible to extract only the columns you want to
make up a row of data. Old "records" that computers read forced the computer to read
everything, even if you only wanted a tiny portion of the record. In databases, a row can be made
6
up of as many or as few columns as you want. This makes reading data much more efficient -
you fetch what you want.
2.1.3 TABLE
A table is a logical group of columns. For example, you may have a table that stores details of
customers' names and addresses. Another table would be used to store details of parts and yet
another would be used for supplier's names and addresses. It is the tables that make up the entire
database and it is important that we do not duplicate data at all. Only keys would duplicate (and
even then, on some tables - these would be unique).
2.1.4 KEYS
A key is a single or combination of multiple fields. Its purpose is to access or retrieve data rows
from table according to the requirement. The keys are defined in tables to access or sequence the
stored data quickly and smoothly. They are also used to create links between different tables.
TYPES OF KEYS
The following tables or relations will be used to define different types of keys.
PRIMARY KEY
A primary key is a value that can be used to identify a unique row in a table. Attributes are
associated with it. Examples of primary keys are Social Security numbers (associated to a
specific person) or ISBNs (associated to a specific book). In the relational model of data, a
primary key is a candidate key chosen as the main method of uniquely identifying a tuple in a
relation.
7
SECONDARY KEY
A field or combination of fields that is basis for retrieval is known as secondary key. Secondary
key is a non-unique field. One secondary key value may refer to many records.
COMPOSITE KEY
A primary key that consists of two or more attributes is known as composite key.
CANDIDATE KEY
A candidate key is a field or combination of fields that can act as a primary key field for that
table to uniquely identify each record in that table.
ALTERNATE KEY
An alternate key is any candidate key which is not selected to be the primary key.
SUPER KEY
A super key is defined in the relational model as a set of attributes of a relation variable (relvar)
for which it holds that in all relations assigned to that variable there are no two distinct tuples
(rows) that have the same values for the attributes in this set. Equivalently a superkey can also be
defined as a set of attributes of a relvar upon which all attributes of the relvar are functionally
dependent.
FOREIGN KEY
A foreign key (FK) is a field or group of fields in a database record that points to a key field or
group of fields forming a key of another database record in some (usually different) table.
Usually a foreign key in one table refers to the primary key (PK) of another table. This way
8
references can be made to link information together and it is an essential part of database
normalization.
2.1.5 FORMS
Forms are used to enter or look at data. They are the screens that make up the application that
allows us to enter and retrieve data. Each "field" on a form usually relates to a column on the
database but this is not always the case.
For example, suppose you have a field that shows a total of two or more items on a form. The
total would NOT be stored on on the database. Why? Because if one of the columns were
incorrectly updated, you wouldn't know which was incorrect? For example. Suppose you had 3
columns on a table - A, B and C. A is equal to B + C. suppose the columns contain the following
values:
A = 11
B=7
C=5
Which column is wrong? You might think that A is wrong, but perhaps B should be equal to 6 or
maybe C should be 4. You just don't know. This is why derived items are not stored on the
database but can be shown on a form. The form should have some form of validation so that the
user cannot enter "rubbish" onto the database. Numbers must store numbers; dates should store
dates and so on. Depending on the application, there can be some very complex validation.
2.1.6 REPORTS
9
It is all very well placing all this data into the database, but if you cannot access it in a format
that you want (usually on paper), then it is extremely limited in use. Every database should have
some kind of reporting facility and I usually find that the report side of things is what makes a
good (or bad) database application/package. Reports should be easy to produce and be quick.
Simple reports should be so easy to produce that anyone can do it. More complex reports will
need someone with programming/database knowledge, as it is possible to produce really silly
results.
2.1.7 QUERY
Queries are little programs that let the user ask questions. Things like "Give me a list of all the
customers who ordered something last month" or "List all customers in a specific area". Queries
can and often are the foundation of reports (in that a report can be based on the results of a query.
Depending on the database, queries can be constructed on screen or you may even be able to type
in a query command direct. Queries often use a language called "SQL" (Structured Query
Language) which is a computer language that makes accessing databases simple. These then are
the general components of what makes up a database. Next week, we take a simple look at some
basic design concepts for your tables.
There is various kind of database which exists in the computer world. They are differentiated on
their functionality and how they model the data.
10
OPERATIONAL DATABASE
These databases store detailed data needed to support the operations of an entire organization.
They are also called subject-area databases (SADB), transaction databases, and production
databases. For example:
Customer database
Personal database
Inventory database
Accounting database
ANALYTICAL DATABASE
Analytic databases (a.k.a. OLAP- On Line Analytical Processing) are primarily static, read-only
databases which store archived, historical data used for analysis. For example, a company might
store sales records over the last ten years in an analytic database and use that database to analyze
marketing strategies in relationship to demographics. On the web, you will often see analytic
databases in the form of inventory catalogs such as the one shown previously from Amazon.com.
An inventory catalog analytical database usually holds descriptive information about all
available products in the inventory. Web pages are generated dynamically by querying the list of
available products in the inventory against some search parameters. The dynamically-generated
page will display the information about each item (such as title, author, ISBN) which is stored in
the database.
DATA WAREHOUSE
A data warehouse stores data from current and previous years data extracted from the various
operational databases of an organization. It becomes the central source of data that has been
screened, edited, standardized and integrated so that it can be used by managers and other end-
11
user professionals throughout an organization. Data warehouses are characterized by being slow
to insert into but fast to retrieve from. Recent developments in data warehousing have led to the
use of a Shared nothing architecture to facilitate extreme scaling.
DISTRIBUTED DATABASE
These are databases of local work-groups and departments at regional offices, branch offices,
manufacturing plants and other work sites. These databases can include segments of both
common operational and common user databases, as well as data generated and used only at a
user’s own site.
END-USER DATABASE
These databases consist of a variety of data files developed by end-users at their workstations.
Examples of these are collections of documents in spreadsheets, word processing and even
downloaded files.
EXTERNAL DATABASE
These databases provide access to external, privately-owned data online — available for a fee to
end-users and organizations from commercial services. Access to a wealth of information from
external database is available for a fee from commercial online services and with or without
charge from many sources in the Internet.
12
HYPERMEDIA DATABASES ON THE WEB
These are a set of interconnected multimedia pages at a web-site. They consist of a home page
and other hyperlinked pages [of multimedia or mixed media such as text, graphic, photographic
images, video clips, audio etc.
NAVIGATIONAL DATABASE
In navigational databases, queries find objects primarily by following references from other
objects. Traditionally navigational interfaces are procedural, though one could characterize some
modern systems like XPath as being simultaneously navigational and declarative.
IN-MEMORY DATABASES
In-memory databases primarily rely on main memory for computer data storage. This contrasts
with database management systems which employ a disk-based storage mechanism. Main
memory databases are faster than disk-optimized databases since the internal optimization
algorithms are simpler and execute fewer CPU instructions. Accessing data in memory provides
faster and more predictable performance than disk. In applications where response time is
critical, such as telecommunications network equipment that operates emergency systems, main
memory databases are often used.
DOCUMENT-ORIENTED DATABASES
13
that has certain characteristics. Any number of fields of any length can be added to a document.
Fields can also contain multiple pieces of data.
REAL-TIME DATABASES
A real-time database is a processing system designed to handle workloads whose state may
change constantly. This differs from traditional databases containing persistent data, mostly
unaffected by time. For example, a stock market changes rapidly and dynamically. Real-time
processing means that a transaction is processed fast enough for the result to come back and be
acted on right away. Real-time databases are useful for accounting, banking, law, medical
records, multi-media, process control, reservation systems, and scientific data analysis. As
computers increase in power and can store more data, real-time databases become integrated into
society and are employed in many applications.
HIERARCHICAL DATABASES
As its name implies, the Hierarchical Database Model defines hierarchically-arranged data.
Perhaps the most intuitive way to visualize this type of relationship is by visualizing an upside
down tree of data. In this tree, a single table acts as the "root" of the database from which other
tables "branch" out. You will be instantly familiar with this relationship because that is how all
windows-based directory management systems (like Windows Explorer) work these days.
Relationships in such a system are thought of in terms of children and parents such that a child
may only have one parent but a parent can have multiple children. Parents and children are tied
together by links called "pointers" (perhaps physical addresses inside the file system). A parent
will have a list of pointers to each of their children.
14
This child/parent rule assures that data is systematically accessible. To get to a low-level table,
you start at the root and work your way down through the tree until you reach your target. Of
course, as you might imagine, one problem with this system is that the user must know how the
tree is structured in order to find anything.
NETWORK DATABASES
In many ways, the Network Database model was designed to solve some of the more serious
problems with the Hierarchical Database Model. Specifically, the Network model solves the
problem of data redundancy by representing relationships in terms of sets rather than hierarchy.
The model had its origins in the Conference on Data Systems Languages (CODASYL) which
had created the Data Base Task Group to explore and design a method to replace the hierarchical
model. The network model is very similar to the hierarchical model actually. In fact, the
hierarchical model is a subset of the network model. However, instead of using a single-parent
tree hierarchy, the network model uses set theory to provide a tree-like hierarchy with the
exception that child tables were allowed to have more than one parent. This allowed the network
model to support many-to-many relationships. Visually, a Network Database looks like a
hierarchical Database in that you can see it as a type of tree. However, in the case of a Network
Database, the look is more like several trees which share branches. Thus, children can have
multiple parents and parents can have multiple children.
15
Nevertheless, though it was a dramatic improvement, the network model was far from perfect.
Most profoundly, the model was difficult to implement and maintain. Most implementations of
the network model were used by computer programmers rather than real users. What was needed
was a simple model which could be used by real end users to solve real problems.
16
applications. There are also other types of databases which cannot be classified as relational
databases. Most notable is the object database management system, which stores language
objects natively without using a separate data definition language and without translating into a
separate storage schema. Unlike relational systems, these object databases store the relationship
between complex data types as part of their storage model in a way that does not require runtime
calculation of related data using relational algebra execution algorithms.
Data are actually stored as bits, or numbers and strings, but it is difficult to work with data at this
level. It is necessary to view data at different levels of abstraction.
Schema:
Description of data at some level. Each level has its own schema.
Physical,
Conceptual
External.
17
PHYSICAL DATA LEVEL
The physical schema describes details of how data is stored: files, indices, etc. on the random
access disk system. It also typically describes the record layout of files and type of files (hash, b-
tree, flat).Early applications worked at this level - explicitly dealt with details. E.g., minimizing
physical distances between related data and organizing the data structures within the file
(blocked records, linked lists of blocks, etc.)
Problem:
CONCEPTUAL DATA LEVEL
In the relational model, the conceptual schema presents data as a set of tables.
The DBMS maps data access between the conceptual to physical schemas automatically.
18
DBMS must change mapping from conceptual to physical.
Referred to as physical data independence.
In the relational model, the external schema also presents data as a set of relations. An external
schema specifies a view of the data in terms of the conceptual level. It is tailored to the needs of
a particular category of users. Portions of stored data should not be seen by some users and
begins to implement a level of security and simplifies the view for these users
Examples:
Information that can be derived from stored data might be viewed as if it were stored.
Applications are written in terms of an external schema. The external view is computed when
accessed. It is not stored. Different external schemas can be provided to different categories of
users. Translation from external level to conceptual level is done automatically by DBMS at run
time. The conceptual schema can be changed without changing application:
19
of data models has been the cornerstone of the evolution of databases. As models have advanced
so has database efficiency.
The flat (or table) model consists of a single, two-dimensional array of data elements, where all
members of a given column are assumed to be similar values, and all members of a row are
assumed to be related to one another. For instance, columns for name and password that might be
used as a part of a system security database. Each row would have the specific password
associated with an individual user. Columns of the table often have a type associated with them,
defining them as character data, date or time information, integers, or floating point numbers.
This may not strictly qualify as a data model, as defined above.
20
In a hierarchical model, data is organized into a tree-like structure, implying a single upward link
in each record to describe the nesting, and a sort field to keep the records in a particular order in
each same-level list. Hierarchical structures were widely used in the early mainframe database
management systems, such as the Information Management System (IMS) by IBM, and now
describe the structure of XML documents. This structure allows one 1:N relationship between
two types of data. This structure is very efficient to describe many relationships in the real world;
recipes, table of contents, ordering of paragraphs/verses, any nested and sorted information.
However, the hierarchical structure is inefficient for certain database operations when a full path
(as opposed to upward link and sort field) is not also included for each record. Parent–child
relationship: Child may only have one parent but a parent can have multiple children. Parents
and children are tied together by links called "pointers". A parent will have a list of pointers to
each of their children.
21
The network model (defined by the CODASYL specification) organizes data using two
fundamental constructs, called records and sets. Records contain fields (which may be organized
hierarchically, as in the programming language COBOL). Sets (not to be confused with
mathematical sets) define one-to-many relationships between records: one owner, many
members. A record may be an owner in any number of sets, and a member in any number of sets.
The network model is a variation on the hierarchical model, to the extent that it is built on the
concept of multiple branches (lower-level structures) emanating from one or more nodes (higher-
level structures), while the model differs from the hierarchical model in that branches can be
connected to multiple nodes. The network model is able to represent redundancy in data more
efficiently than in the hierarchical model. The operations of the network model are navigational
in style: a program maintains a current position, and navigates from one record to another by
following the relationships in which the record participates. Records can also be located by
supplying key values. Although it is not an essential feature of the model, network databases
generally implement the set relationships by means of pointers that directly address the location
of a record on disk. This gives excellent retrieval performance, at the expense of operations such
as database loading and reorganization. Most object databases use the navigational concept to
provide fast navigation across networks of objects, generally using object identifiers as "smart"
pointers to related objects. Objectivity/DB, for instance, implements named 1:1, 1:many, many:1
and many: many named relationships that can cross databases. Many object databases also
support SQL, combining the strengths of both models.
22
2.4.4 RELATIONAL MODEL
The relational model was introduced by E.F. Codd in 1970 as a way to make database
management systems more independent of any particular application. It is a mathematical model
defined in terms of predicate logic and set theory. The products that are generally referred to as
relational databases in fact implement a model that is only an approximation to the mathematical
model defined by Codd. Three key terms are used extensively in relational database models:
Relations, attributes, and domains. A relation is a table with columns and rows. The named
columns of the relation are called attributes, and the domain is the set of values the attributes are
allowed to take. All relations (and, thus, tables) in a relational database have to adhere to some
basic rules to qualify as relations. First, the ordering of columns is immaterial in a table. Second,
there can't be identical tuples or rows in a table. And third, each tuple will contain a single value
for each of its attributes.
A relational database contains multiple tables, each similar to the one in the "flat" database
model. One of the strengths of the relational model is that, in principle, any value occurring in
two different records (belonging to the same table or to different tables), implies a relationship
among those two records. Yet, in order to enforce explicit integrity constraints, relationships
between records in tables can also be defined explicitly, by identifying or non-identifying parent-
child relationships characterized by assigning cardinality (1:1, (0)1:M, M:M). Tables can also
23
have a designated single attribute or a set of attributes that can act as a "key", which can be used
to uniquely identify each tuple in the table. A key that can be used to uniquely identify a row in a
table is called a primary key. Keys are commonly used to join or combine data from two or more
tables. For example, an Employee table may contain a column named Location which contains a
value that matches the key of a Location table. Keys are also critical in the creation of indexes,
which facilitate fast retrieval of data from large tables. Any column can be a key, or multiple
columns can be grouped together into a compound key. It is not necessary to define all the keys
in advance; a column can be used as a key even if it was not originally intended to be one.A key
that has an external, real-world meaning (such as a person's name, a book's ISBN, or a car's serial
number) is sometimes called a "natural" key. If no natural key is suitable (think of the many
people named Brown), an arbitrary or surrogate key can be assigned (such as by giving
employees ID numbers). In practice, most databases have both generated and natural keys,
because generated keys can be used internally to create links between rows that cannot break,
while natural keys can be used, less reliably, for searches and for integration with other
databases. (For example, records in two independently developed databases could be matched up
by social security number, except when the social security numbers are incorrect, missing, or
have changed.)
The dimensional model is a specialized adaptation of the relational model used to represent data
in data warehouses in a way that data can be easily summarized using OLAP queries. In the
dimensional model, a database consists of a single large table of facts that are described using
dimensions and measures. A dimension provides the context of a fact (such as who participated,
when and where it happened, and its type) and is used in queries to group related facts together.
Dimensions tend to be discrete and are often hierarchical; for example, the location might
include the building, state, and country. A measure is a quantity describing the fact, such as
revenue. It's important that measures can be meaningfully aggregated - for example, the revenue
from different locations can be added together. In an OLAP query, dimensions are chosen and
the facts are grouped and added together to create a summary. The dimensional model is often
24
implemented on top of the relational model using a star schema, consisting of one table
containing the facts and surrounding tables containing the dimensions. Particularly complicated
dimensions might be represented using multiple tables, resulting in a snowflake schema. A data
warehouse can contain multiple star schemas that share dimension tables, allowing them to be
used together. Coming up with a standard set of dimensions is an important part of dimensional
modeling.
In recent years, the object-oriented paradigm has been applied to database technology, creating a
new programming model known as object databases. These databases attempt to bring the
database world and the application programming world closer together, in particular by ensuring
that the database uses the same type system as the application program. This aims to avoid the
overhead (sometimes referred to as the impedance mismatch) of converting information between
its representation in the database (for example as rows in tables) and its representation in the
application program (typically as objects). At the same time, object databases attempt to
introduce the key ideas of object programming, such as encapsulation and polymorphism, into
the world of databases. A variety of these ways have been tried for storing objects in a database.
Some products have approached the problem from the application programming end, by making
the objects manipulated by the program persistent. This also typically requires the addition of
some kind of query language, since conventional programming languages do not have the ability
to find objects based on their information content. Others have attacked the problem from the
database end, by defining an object-oriented data model for the database, and defining a database
programming language that allows full programming capabilities as well as traditional query.
25
Object databases suffered because of a lack of standardization: although standards were defined
by ODMG, they were never implemented well enough to ensure interoperability between
products. Nevertheless, object databases have been used successfully in many applications:
usually specialized applications such as engineering databases or molecular biology databases
rather than mainstream commercial data processing. However, object database ideas were picked
up by the relational vendors and influenced extensions made to these products and indeed to the
SQL language.
Database design is the process of producing a detailed data model of a database. This logical data
model contains all the needed logical and physical design choices and physical storage
parameters needed to generate a design in a Data Definition Language, which can then be used to
create a database. A fully attributed data model contains detailed attributes for each entity. The
term database design can used to describe many different parts of the design of an overall
database system. Principally, and most correctly, it can be thought of as the logical design of the
base data structures used to store the data. In the relational model these are the tables and views.
26
In an object database the entities and relationships map directly to object classes and named
relationships. However, the term database design could also be used to apply to the overall
process of designing, not just the base data structures, but also the forms and queries used as part
of the overall database application within the database management system (DBMS). The
process of doing database design generally consists of a number of steps which will be carried
out by the database designer. Usually, the designer must:
* Superimpose a logical structure upon the data on the basis of these relationships
Determine the purpose of your database - This helps prepare you for the remaining
steps.
Find and organize the information required - Gather all of the types of information
you might want to record in the database, such as product name and order number.
Divide the information into tables - Divide your information items into major
entities or subjects, such as Products or Orders. Each subject then becomes a table.
Turn information items into columns - Decide what information you want to store
in each table. Each item becomes a field, and is displayed as a column in the table.
For example, an Employees table might include fields such as Last Name and Hire
Date.
Specify primary keys - Choose each table’s primary key. The primary key is a
column that is used to uniquely identify each row. An example might be Product ID
or Order ID.
Set up the table relationships - Look at each table and decide how the data in one
table is related to the data in other tables. Add fields to tables or create new tables to
clarify the relationships, as necessary.
27
Refine your design - Analyze your design for errors. Create the tables and add a few
records of sample data. See if you can get the results you want from your tables.
Make adjustments to the design, as needed.
Apply the normalization rules - Apply the data normalization rules to see if your
tables are structured correctly. Make adjustments to the tables, as needed.
CONCEPTUAL SCHEMA
Once a database designer is aware of the data which is to be stored within the database, they
must then determine where dependency is within the data. Sometimes when data is changed you
can be changing other data that is not visible. For example, in a list of names and addresses,
assuming a situation where multiple people can have the same address, but one person cannot
have more than one addresses, the name is dependent upon the address, because if the address is
different than the associated name is different too. However, the other way around is different.
One attribute can change and not another.
Once the relationships and dependencies amongst the various pieces of information have been
determined, it is possible to arrange the data into a logical structure which can then be mapped
into the storage objects supported by the database management system. In the case of relational
databases the storage objects are tables which store data in rows and columns. Each table may
represent an implementation of either a logical object or a relationship joining one or more
instances of one or more logical objects. Relationships between tables may then be stored as
links connecting child tables with parents. Since complex logical relationships are themselves
tables they will probably have links to more than one parent.
28
The physical design of the database specifies the physical configuration of the database on the
storage media. This includes detailed specification of data elements, data types, indexing options
and other parameters residing in the DBMS data dictionary. It is the detailed design of a system
that includes modules & the database's hardware & software specifications of the system.
Database Management System (DBMS) consists of software that organizes the storage of data.
A DBMS controls the creation, maintenance, and use of the database storage structures of social
organizations and of their users. It allows organizations to place control of organization wide
database development in the hands of Database Administrators (DBAs) and other specialists. In
large systems, a DBMS allows users and other software to store and retrieve data in a structured
way. Database management systems are usually categorized according to the database model that
they support, such as the network, relational or object model. The model tends to determine the
query languages that are available to access the database. One commonly used query language
for the relational database is SQL, although SQL syntax and function can vary from one DBMS
to another. A common query language for the object database is OQL, although not all vendors
of object databases implement this, majority of them do implement this method. A great deal of
the internal engineering of a DBMS is independent of the data model, and is concerned with
managing factors such as performance, concurrency, integrity, and recovery from hardware
failures. In these areas there are large differences between the products. The database holds the
records, fields, cells with data. The DBMS is the "tools" used to manipulate the data in the
database. An example of a DBMS is SQL Server. The DBMS is the database engine itself. This
is the core, the actual program that contains the logic to execute your SQL statements. Performs
that actual data access based upon these commands (relationally).
29
DBMS Engine accepts logical request from the various other DBMS subsystems,
converts them into physical equivalents, and actually accesses the database and data
dictionary as they exist on a storage device.
Data Definition Subsystem helps user to create and maintain the data dictionary and
define the structure of the files in a database.
Data Manipulation Subsystem helps user to add, change, and delete information in a
database and query it for valuable information. Software tools within the data
manipulation subsystem are most often the primary interface between user and the
information contained in a database. It allows user to specify its logical information
requirements.
Application Generation Subsystem contains facilities to help users to develop
transaction-intensive applications. It usually requires that user perform a detailed series of
tasks to process a transaction. It facilitates easy-to-use data entry screens,
programming languages, and interfaces.
Data Administration Subsystem helps users to manage the overall database
environment by providing facilities for backup and recovery, security management,
query optimization, concurrency control, and change management.
Database Development: used to define and organize the content, relationships, and
structure of the data needed to build a database.
Database Interrogation: can access the data in a database for information retrieval and
report generation. End users can selectively retrieve and display information and
produce printed reports and documents.
Database Maintenance: used to add, delete, update, correct, and protect the data in a
database.
30
Application Development: used to develop prototypes of data entry screens, queries,
forms, reports, tables, and labels for a prototyped application. Or use 4GL or 4th
Generation Language or application generator to develop program codes.
Alternatively, and especially in connection with the relational model of database management,
the relation between attributes drawn from a specified set of domains can be seen as being
primary. For instance, the database might indicate that a car that was originally "red" might fade
to "pink" in time, provided it was of some particular "make" with an inferior paint job. Such
higher entity relationships provide information on all of the underlying domains at the same
time, with none of them being privileged above the others. Thus, the DBMSs of today roll
together frequently needed services or features of attribute management. By externalizing such
functionality to the DBMS, applications effectively share code with each other and are relieved
of much internal complexity. Features commonly offered by database management systems:
QUERY ABILITY
Querying is the process of requesting attribute information from various perspectives and
combinations of factors. Example: "How many 2-door cars in Texas are green?" A
database query language and report writer allow users to interactively interrogate the
database, analyze its data and update it according to the users privileges on data.
Copies of attributes need to be made regularly in case primary disks or other equipment
fails. A periodic copy of attributes may also be created for a distant organization that
cannot readily access the original. DBMS usually provide utilities to facilitate the process
31
of extracting and disseminating attribute sets. When data is replicated between database servers,
so that the information remains consistent throughout the database system and users cannot tell
or even know which server in the DBMS they are using, the system is said to exhibit replication
transparency.
RULE ENFORCEMENT
Often one wants to apply rules to attributes so that the attributes are clean and reliable. For
example, we may have a rule that says each car can have only one engine associated with it
(identified by Engine Number). If somebody tries to associate a second engine with a given car,
we want the DBMS to deny such a request and display an error message. However, with changes
in the model specification such as, in this example, hybrid gas-electric cars, rules may need to
change. Ideally such rules should be able to be added and removed as needed without significant
data layout redesign.
SECURITY
Often it is desirable to limit who can see or change which attributes or groups of attributes. This
may be managed directly by individual, or by the assignment of individuals and privileges to
groups, or (in the most elaborate models) through the assignment of individuals and groups to
roles which are then granted entitlements.
COMPUTATION
There are common computations requested on attributes such as counting, summing, averaging,
sorting, grouping, cross-referencing, etc. Rather than have each computer application implement
these from scratch, they can rely on the DBMS to supply such calculations.
32
CHANGE AND ACCESS LOGGING
Often one wants to know who accessed what attributes, what was changed, and when it was
changed. Logging services allow this by keeping a record of access occurrences and changes.
AUTOMATED OPTIMIZATION
If there are frequently occurring usage patterns or requests, some DBMS can adjust themselves
to improve the speed of those interactions. In some cases the DBMS will merely provide tools to
monitor performance, allowing a human expert to make the necessary adjustments after
reviewing the statistics collected.
RDBMS stands for Relational Database Management System. RDBMS data is structured in
database tables, fields and records. Each RDBMS table consists of database table rows. Each
database table row consists of one or more database table fields. RDBMS store the data into
collection of tables, which might be related by common fields (database table columns). RDBMS
also provide relational operators to manipulate the data stored into the database tables. Most
RDBMS use SQL as database query language. Edgar Codd introduced the relational database
model. Many modern DBMS do not conform to the Codd’s definition of a RDBMS, but
nonetheless they are still considered to be RDBMS. The most popular RDBMS are MS SQL
Server, DB2, Oracle and MySQL.
33
methods to prepare statements execute statements, fetch results, etc. Examples include DDL,
DCL, DML, ODBC, and JDBC. Some vendors provide language-specific proprietary
interfaces. For example MySQL and Firebird provide drivers for PHP, Python
SQL ENGINE - This component interprets and executes the SQL query. It comprises three
major components (compiler, optimizer, and execution engine).
STORAGE ENGINE - This component stores and retrieves data records. It also provides a
mechanism to store metadata and control information such as undo logs, redo logs, lock
tables, etc.
ADVANTAGES
34
Reduced data entry, storage, and retrieval costs
Facilitated development of new applications program
DISADVANTAGES
35
CHAPTER-3: ORACLE
ARCHITECTURAL COMPONENTS
AND DATABASE CREATION
36
Overview of Primary Components
37
1. Oracle Server:- There are several files, process, and memory structures in an
Oracle server, however, not all of them are used when processing a SQL
statement. Some are used to improve the performance of the database, ensure that
the database can be recovered in the event of a software or hardware error, or
perform other tasks necessary to maintain the database. The Oracle server consists
of an Oracle instance and an Oracle database.
2. Oracle Instance:- An Oracle instance is the combination of the background
processes and memory structure. The instance must be started to access the data in
the database. Every time an instance is started, a System Global Area (SGA) is
allocated and Oracle background processes are started. Background processes
perform functions on behalf of the invoking process. They consolidate functions
that would otherwise be handled by multiple Oracle programs running for each
user. The background processes perform input/output (I/O) and monitor other
Oracle processes to provide increase parallelism for better performance and
reliability.
3. Oracle Database:- An Oracle database consists of operating system files, also
known as database files, that provide the actual physical storage for database
information. The database files are used to ensure that the data is kept consistent
and can be recovered in the event of a failure of the instance.
4. Other Key Files:- Non-database files are used to configure the instance,
authenticate privileged users, and recover the database in the event of a disk
failure.
5. User and Server Processes:- The user server processes are the primary processes
involved when a SQL statement is executed; however, other processes may help
the server complete the processing of the SQL statement.
6. Other Processes:- Many other processes exist that are used options within
Oracle, such as Advanced Queuing, Real Application Clusters, Shared Server,
Advanced Replication, and so on.
38
An Oracle Server:
The Oracle server run on a different computers in one of the following ways:
1. Client-Application Server-server
2. Client-Server
3. Host-Based
39
1. Client-Application Server-Server:- (Three-Tier) Users access the database from
their personal computers (client) through an application server, which is used for
the application’s requirements.
2. Client-Server:- (Two-Tier) Users access the database from their personal
computer (client) over a network, and the database sits on a seprate computer
(server).
3. Host-Based:- Users are connected directly to the same computer on which the
database resides.
An Oracle Instance:
40
An Oracle instance consists of the System Global Area (SGA) memory structure and the
background processes used to manage a database. An instance is identified by using methods
specific to each operation system. The instance can open and use only one database at a time.
41
Connection to an Oracle Instance
Before users can submit SQL statements to an Oracle database, they must connect to an instance.
The user starts a tool such as SQL*Plus or runs an application developed using a tool
such as Oracle Forms. This application or tool is executed as a user process.
In the most basic configuration, when a user logs on to the Oracle Server, a process is
created on the computer running the Oracle Server. This process is called a server
process. The server process communicates with the Oracle instance on the behalf of the
user process that runs on the client. The server process executes SQL statements on
behalf of the user.
Connection
1. The user log on to the operating system running the Oracle instance and
starts an application or tool that accesses the database on that system. The
communication pathway is established using the interprocess
communication mechanisms available on the host operating system.
2. The user starts the application or tool on a local computer and connects
over a network to the computer running the Oracle instance. In this
configuration, called client-server, network software is used to
communicate between the user and the Oracle Server.
3. In a three-tier connection, the user’s computer communicates over the
network to an application or a network server, which is connected through
a network to the machine running the Oracle instance. For example, the
user runs a browser on a network to use an application residing on an NT
server that retrieves data from an Oracle database running on a UNIX
host.
42
Session
A session is a specific connection of a user to an Oracle server. The session starts when the user
is validated by the Oracle Server, and it ends when the user logs out or when there is an
abnormal termination. For a given database user, many concurrent sessions are possible if the
user logs on from many tools, applications, or terminals at the same time. Except for some
specialized database administration tools, starting a database session requires that the Oracle
Server be available for use.
An Oracle database:
Is a collection of data that is treated as a unit
Consists of three files types
43
An Oracle Database
The general purpose of a database is to store and retrieve information. An Oracle database has a
logical and a physical structure. The physical structure of the database is the set of operating
system files in the database. An Oracle database consists of three file types:
1. Data files containing the actual data in the database.
2. Redo logs containing a record of changes made to the database to enable recovery
of the data in case of failures.
3. Control files containing information necessary to maintain and verify database
integrity.
The Oracle Server also files that are not part of the database:
The parameter file defines the characteristics of an Oracle instance. For example, it
contains parameters that size some of the memory structure in the SGA.
The password file authenticates users privileged to starts up and down an Oracle instance.
Archived redo log files are offline copies of the redo log files that may be necessary to
recover from media failures.
Other keys files exits and are required to start up and use a database, for example: parameter
files, configuration files, password files and so on. However, the physical structure of an Oracle
database includes only three types of files: control files, data files, and redo log files.
44
3.7 MEMORY STRUCTURE
SGA is used to store database information that is shared by database processes. It contains data
and control information for the Oracle Server and is allocated in the virtual memory of the
computer where Oracle resides.
45
The following statement can be used to view SGA memory allocation
SHOW SGA;
The size of SGA is determined by several initialization parameters. The parameters that most
affects SGA are:
DB_CACHE_SIZE: The size of the cache of standard blocks.
LOG_BUFFER: The number of bytes allocates for the redo log buffer cache.
SHARED_POOL_SIZE: The size in bytes of the area devoted to shared SQL and PL/SQL.
LARGE_POOL_SIZE: The size of the large pool; the default size is zero.
Dynamic SGA
A dynamic SGA implements an infrastructure that allows the SGA configuration to change
without shutting down the instance. This then allows the size of the database buffer cache, shared
pool, and large pool to changes without shutting down the instance. Conceivably, the database
buffer cache, shared pool, and large pool could be initially under configured and would grow and
shrink depending upon their respective workloads, up to a maximum of
SGA_MAX_SIZE.
Shared pool
Database Buffer Cache
Redo Log buffer
Other structures (for example, lock and latch management, statistical data)
46
There are two additional memory structures that can be configured within the SGA:
Large Pool
Java Pool
SGA is dynamic
Sized by the SGA_MAX_SIZE parameter
Allocated and tracked in granules by SGA components
Contiguous virtual memory allocation
Granules size based on total estimated SGA_MAX_SIZE
Unit of Allocation
A granule is a unit of contiguous virtual memory allocation. The size of a granule depends on the
estimated total SGA size whose calculation is based on the value of the parameter
SGA_MAX_SIZE
- 4 MB if estimated SGA size is < 128 MB
- 16 MB otherwise
Used to store:
Most recently executed SQL statements
Most recently used data definitions
47
Sized by the parameter
SHARED_POOL_SIZE
The shared pool environment contains both fixed and variable structure. The fixed structures
remain relatively the same size, whereas the variable structures grow and shrink based on user
and program requirements. The actual sizing for the fixed and variable structures is based on an
initialization parameter and the work of an Oracle internal algorithm.
Since the shared pool is used for objects that can be shared globally, such as reusable SQL
execution plans; PL/SQL packages, procedures, and functions; and cursor information, it must be
sized to accommodate the needs of both the fixed and variable areas. Memory allocation for the
shared pool is determined by the SHARED_POOL_SIZE initialization parameter. It can be
dynamically resized using ALTER SYSTEM SET. After performance analysis, this can be
adjusted but the total SGA size cannot exceed SGA_MAX_SIZE.
48
Consists of two structures:
Shared SQL area
Shared PL/SQL area
Size determine by the Shared Pool sizing.
The library cache size is based on the sizing defined for the shared pool. Memory is allocated
when a statement is parsed or a program unit is called. If the size of the shared pool is too small,
statements are continually reloaded into the library cache, which affects performance. The library
cache is managed by a least recently used (LRU) algorithm. As the cache fills, less recently used
execution paths and parse trees are removed from the library cache to make room for the new
entries. If the SQL or PL/SQL statements are not reused, they eventually are aged out.
The library cache consists of two structures:
Shared SQL:- The shared SQL stores and shares the execution plan and parse tree for
SQL statements run against the database. The second time that an identical SQL
statement is run; it is able to take advantage of the parse information available in the
shared SQL to expedite its execution. To ensure that SQL statements use a shared SQL
area whenever possible, the text, schema, and bind variables must be exactly the same.
Shared PL/SQL:- The shared PL/SQL area stores and shares the most recently executed
PL/SQL statements. Parsed and compiled program units and procedures are stored in this
area.
49
Size determine by the Shared Pool sizing.
The data dictionary cache is also referred to as the dictionary cache or row cache, caching data
dictionary information into memory improves response time. Information about the database is
stored in the data dictionary tables. When this information is needed by the database, the data
dictionary tables are read, and the data that is returned is stored in the data dictionary cache.
The overall size is dependent on the size of the shared pool size and is internally by the database.
If the data dictionary cache is too small, then the database has to query the data dictionary tables
repeatedly for information needed by the database. These queries are called recursive calls and
are slower than the queries that are handled by the data dictionary cache.
Stores copies of data blocks that have been retrieved from the data files
Enables great performance gains when you obtain and update data
Managed through an LRU algorithm
DB_BLOCK_SIZE determine primary block size.
When a query is processed, the Oracle server process looks in the database buffer cache for any
blocks it needs. If the block is not found in the database buffer cache, the server process reads the
block from the data files and places a copy in the database buffer cache. Because subsequent
require physical reads. The Oracle server uses a least recently used algorithm to age out buffers
that have not been accessed recently to make room for new blocks in the database buffer cache.
50
Database Buffer Cache consists of independent sub-caches
DB_CACHE_SIZE
DB_KEEP_CACHE_SIZE
DB_RECYCLE_CACHE_SIZE
Can alter dynamically resized
The size of buffer cache is equal to the size of an Oracle block, and it is specified by the
DB_BLOCK_SIZE parameter. The database buffer cache consists of independent sub-caches for
buffer pools and for multiple block sizes. The parameter DB_BLOCK_SIZE determines the
primary block size, which is used for the SYSTEM tablespace.
Three parameters define the sizes of the buffer caches:
51
1. DB_BLOCK_SIZE:- Sizes the default buffer cache size only, it always exists and
cannot be set to zero.
2. DB_KEEP_CACHE:- Sizes the keep buffer cache, which s used to retain blocks in
memory that are likely to be reused.
3. DB_RECYCLE_CACHE_SIZE:- Sizes the recycle buffer cache, which is used to
eliminate blocks from memory that have little change of being reused.
An Oracle database can be created with a standard block size and up to four non-standard block
sizes. Non-standard block sizes can have any power-of-two values between 2 KB and 32 KB.
The buffer cache advisory feature enables and disables statistics gathering for predicting
behavior with different cache sizes. The information provided by these statistics can help BDA
size the buffer cache optimally for a given workload. The buffer cache advisory information is
collected and displayed through the V$DB_CACHE_ADVICE view.
The buffer cache advisory is enabled via the initialization parameter DB_CACHE_ADVICE.
It is a dynamic parameter via ALTER SYSTEM. Three values ( OFF, ON, READY) are
available.
OFF:- Advisory is turned off and the memory for the advisory is not allocated
ON:- Advisory is turned on and both cpu and memory overhead is incurred.
READY:- Advisory is turned off but the memory for the advisory remains allocated.
52
3.7.6 REDO LOG BUFFER
Records all changes made to the database blocks
Primary purpose is recovery
Changes recorded within are called redo entries
Redo entries contain information to reconstruct or redo changes
Size defined by LOG_BUFFER
The redo log buffer cache is a circular buffer that contains changes made to data files blocks.
This information is stored in redo entries. Redo entries contain the information necessary to
recreate the data prior to the change made by INSERT, UPDATE, CREATE, ALTER, or DROP
operation.
The size of the redo log buffer cache is defined by the initialization parameter LOG_BUFFER.
53
3.7.7 LARGE POOL
An optional area of memory in the SGA
Relieves the burden placed on the Shared Pool
Used for:
Session memory (UGA) for the Shared Server
I/O server processes
Backup and restore operations or RMAN
Parallel execution message buffers
PARALLEL_AUTOMATIC_TUNING set to TRUE
Does not use an LRU list
Sized by LARGE_POOL_SIZE
When users connect through the shared server, Oracle needs to allocate additional space in the
shared pool for storing information about the connections between the user processes, dispatches,
and servers. The large pool relieves the burden on areas within the shared pool. The shared pool
does not have to give up memory for caching SQL parse trees in favor f shared server session
information, I/O, and backup and recovery processes. The performance gain is from the
reduction of overhead from increasing and shrinking of the shared SQL cache.
Backup and Restore:- Recovery Manager (RMAN) uses the large pool when the
BACKUP_DISK_IO=n and BACKUP_TAPE_IO_SLAVE=TRUE parameters are set. If the
large pool is configured but is not large enough, the allocation of memory from the large pool
fails. RMAN writes an error message to the alert log file and does not use I/O slaves for backup
or restore.
54
3.7.8 JAVA POOL
The Java pool is an optional setting but is required if installing and using Java. Its size is set, in
bytes, using the JAVA_POOL_SIZE parameter.
55
The Program Global Area or Process Global Area (PGA) is a memory region that
contains data and control information for a single server process or a single background
process. The PGA is allocated when a process is created and de-allocated when the
process is terminated. In contrast to the SGA, which is shared by several processes, the
PGA is an area that is used by only one process. In dedicated server configuration, the
PGA included these components:
Sort Area: - Used for any sorts that may be required to process the SQL
statement.
Session Information: - Includes user privileges and performance statistics for the
session.
Cursor State: - Indicates the stage in the processing of the SQL statements that
are currently used by the session.
Stack Space: - Contains other session variables.
56
A database user who needs to request information from the database must first make a
connection with the Oracle server. The connection is requested using a database interface tool,
such as SQL*PLUS, and beginning the user process. The user process does not interact directly
with the Oracle server. Rather it generates call through the user program interface (UPI), which
creates a session and starts a server process.
57
Once a user has established a connection, a server process is started to handle the user processes
request. A server process can be either a dedicated server process or a shared server process. In a
dedicated server environment, the server process is terminated. In a shared server process
communication with the Oracle server using the Oracle Program Interface (OPI).
58
LMDn
RECO
CJQ0
LMON
Snnn
Dnnn
Pnnn
LCKn
QMNn
The server process records changes to rollback and data blocks in the buffer cache. Database
Writer (DBWn) writes the dirty buffers from the database buffer cache to the data files. It
ensures that a sufficient number of free buffers-buffers that can be overwritten when server
processes need to read in blocks from the data files-are available in the database buffer cache.
Database performance is improved because server processes make changes only in the buffer
cache.
59
3.7.11.2 LOG WRITER (LGWR)
LGWR writes:
At commit
When one-third full
When there is 1 MB of redo
Every three seconds
Before DBWn writes
Because the redo is needed for recovery, LGWR confirms the commit only after the redo is
written to disk.
LGWR can also call on DBWn to write to the data files.
60
3.7.11.3 SYSTEM MONITOR (SMON)
Responsibilities:
Instance recovery
Rolls forward changes in redo logs
Opens database for user access
Rolls back uncommitted transactions
Coalesce free space
De-allocates temporary segments
If the Oracle instance fails, any information in the SGA that has not been written to disk is lost.
For example, the failure of the operating system causes an instance failure. After the loss of the
instance, the background process SMON automatically performs instance recovery when the
database is reopened. Instance recovery consists of the following steps:
1. Rolling forward to recover data that has not been recorded in the files but that has been
recorded in the online redo log. This data has been written to disk because of the loss of
61
the SGA during instance failure. During this process, SMON reads the redo log files and
applies the changes recorded in the redo log to the data blocks. Because all committed
transaction have been written to the redo logs, this process completely recovers these
transactions.
2. Opening the database so that users can log on. Any data that is not locked by unrecorded
transactions is immediately available.
3. Rolling back uncommitted transactions. They are rolled back by SMON or by the
individual server processes as they access locked data.
62
3.7.11.5 CHECKPOINT (CKPT)
Responsible for:
Signaling DBWn at checkpoints
Updating data files headers with checkpoint information
Updating control files with checkpoint information
63
An event is called a checkpoint occurs when the Oracle background process DBWn writes all the
modified database buffers in the SGA, including both committed and uncommitted data, to the
data files.
Checkpoints are implemented for the following reasons:
Checkpoints ensure that data blocks in memory that changes frequently are written to
data files regularly. Because of the least recently used algorithm of DBWn, a data block
that changes frequently might never qualify as the least recently used block and thus
might never be written to disk if checkpoints did not occur.
Because all database changes up to the checkpoint have been recorded in the data files,
redo log entries before the checkpoint no longer need to be applied to the data files if
instance recovery is required. Therefore, checkpoints are useful because they can
expedite instance recovery.
CKPT does not write data blocks to disk or redo blocks to the online redo logs.
64
The Archive Process
All other background processes are optional, depending on the configuration of the database;
however, one of them, ARCn, is crucial to recovering a database after the loss of a disk. As
online redo log files fill, the Oracle server begins writing to the next online redo log file. The
process of switching from one redo log to another is called a log switch. The ARCn process
initiates backing up, or archiving, of the filled log group at every log switch. It automatically
archives the online redo log before the log can be reused, so that all of the changes made to the
database are preserved. This enables the DBA to recover the database to the point of failure, even
if a disk drive is damaged.
One of the important decisions that a DBA has to make is whether to configure the database to
operate in ARCHIVELOG or in NOARCHIVELOG mode.
NOARCHIVELOG MODE :- In NONARCHIVELOG mode, the online redo log files are
overwritten each time a log switch occurs. LGWR does not overwrite a redo log group until the
checkpoint for that group is complete. This ensures that committed data can be recovered if there
is an instance crash. During the instance crash, only the SGA is lost. There is no loss of disks,
only memory. For example, an operating system crash causes an instance crash.
ARCHIVELOG MODE:- If the database is configured to run in ARCHIVELOG mode,
inactive group of filled online redo log files must be archived before they can be used again.
Since changes made to the database are recovered in the online redo log files, the database
administrator can use the physical backup of the data files and the archived online redo log files
to recover the database without losing any committed data because of any single point of failure,
including the loss of a disk. Usually, a production database is configured to run in
ARCHIVELOG mode.
65
3.7.12 LOGICAL STRUCTURE
66
The Oracle server components that are used depends on the type of SQL statements:
Queries returns rows
DML statements log changes
Commit ensures transaction recovery
Some Oracle server components do not participate in SQL statements processing.
A data manipulation language (DML) statement requires only two phases of processing,
1. Parse is the same as the parse phase used for processing a query.
2. Execute requires additional processing to make data changes.
The server process records the before-image to the undo block and updates the data block.
Both of these changes are done in the database buffer cache. Any changed blocks in the buffer
cache are marked as dirty buffers. That is, buffers that are not same as the corresponding block
on the disk.
The processing of a DELETE or INSERT command uses similar steps. The before-image for a
DELETE contains the column values in the deleted rows, and the before image of an INSERT
contains the rows location information.
67
3.7.14 DATABASE CREATION
Before creating the database we have to define some parameters according to computer
configuration so that performance will increase and less I/O will use. If we can’t configure
parameters than Oracle will automatically take the default values.
Db_name=lalit
Db_block_size=8192
Control_files=’G:\oracle\lalit\control.ctl’
Shared_pool_size=200m
Sga_target=300m
Undo_management=auto
Undo_tablespace=undo1
In this, location is given as per the user need. User can choose any of the location to store his
parameter files. If he can’t choose any location than parameter file will store on default location
in DATABASE folder in Oracle. Parameter file save with “initdatabe_name.ora” extension.
When we start creating database first of all control files will read ad than database will created.
68
Along with this our parameter files will generates whose non-default values we already sets.
69
But some files in the Oracle a DBA can’t change so he has to create a view of that tables.
@?/rdbms/admin/catalog.sql
But with the above command view is created but we can’t read from that view because all the
data is in binary form so DBA has to use the command after this that will decode all the view in
user understandable from.
@?/rdbms/admin/catproc.sql
70
CHAPTER-4: INTRODUCTION TO DATABASE
MIGRATION
71
INTRODUCTION TO DATA MIGRATION
DATA MIGRATION
Data Migration is the process of transferring data between storage types, formats, or computer
systems. Data migration is usually performed programmatically to achieve an automated
migration, freeing up human resources from tedious tasks. It is required when organizations or
individuals change computer systems or upgrade to new systems, or when systems merge (such
as when the organizations that use them undergo a merger/takeover).To achieve an effective data
migration procedure, data on the old system is mapped to the new system providing a design for
data extraction and data loading. The design relates old data formats to the new system's formats
and requirements. Programmatic data migration may involve many phases but it minimally
includes data extraction where data is read from the old system and data loading where data is
written to the new system. If a decision has been made to provide a set input file specification for
loading data onto the target system, this allows a pre-load 'data validation' step to be put in place,
interrupting the standard E(T)L process. Such a data validation process can be designed to
interrogate the data to be migrated, to ensure that it meets the pre-defined criteria of the target
environment, and the input file specification. An alternative strategy is to have on-the fly data
validation occurring at the point of loading, which can be designed to report on load rejection
errors as the load progresses. However, in the event that the extracted and transformed data
elements are highly 'integrated' with one another, and the presence of all extracted data in the
target system is essential to system functionality, this strategy can have detrimental, and not
easily quantifiable effects. After loading into the new system, results are subjected to data
verification to determine whether data was accurately translated, is complete, and supports
processes in the new system. During verification, there may be a need for a parallel run of both
systems to identify areas of disparity and forestall erroneous loss. Automated and manual data
cleaning is commonly performed in migration to improve data quality, eliminate redundant or
obsolete information, and match the requirements of the new system. Data migration phases
(design, extraction, cleansing, load, verification) for applications of moderate to high complexity
are commonly repeated several times before the new system is deployed.
72
4.1 DATA MIGRATION CATEGORIES
Data is stored on various media in files or databases, is generated and consumed by software
applications which in turn support business processes. The need to migrate data can be driven by
multiple business requirements and the approach taken to the migration depends on those
requirements. Three major migration categories are proposed on this basis.
STORAGE MIGRATION
A business may choose to rationalize the physical media to take advantage of more efficient
storage technologies. This will result in having to move physical blocks of data from one tape or
disk to another, often using virtualization techniques. The data format and content itself will not
usually be changed in the process and can normally be achieved with minimal or no impact to
the layers above.
DATABASE MIGRATION
Similarly, it may be necessary to move from one database vendor to another, or to upgrade the
version of database software being used. The latter case is less likely to require a physical data
migration, but this can happen with major upgrades. In these cases a physical transformation
process may be required since the underlying data format can change significantly.
APPLICATION MIGRATION
Changing application vendor - for instance a new CRM or ERP platform will inevitably involve
substantial transformation as almost every application or suite operates on its own specific data
model. Further, to allow the application to be sold to the widest possible market, commercial off-
the-shelf packages are generally configured for each customer using metadata. Application
programming interfaces (APIs) are supplied to protect the integrity of the data they have to
73
handle. Use of the API is normally a condition of the software warranty, although a waiver may
be allowed if the vendor's own or certified partner professional services and tools are used.
Business processes operate through a combination of human and application systems actions,
often orchestrated by business process management tools. When these change they can require
the movement of data from one store, database or application to another to reflect the changes to
the organization and information about customers, products and operations. Examples of such
migration drivers are Mergers and acquisitions, business optimization and reorganization to
attack new markets or respond to competitive threat.
Like data migration it may be necessary to move from one database vendor to another, or to
upgrade the version of database software being used. The latter case is less likely to require a
physical data migration, but this can happen with major upgrades. In these cases a physical
transformation process may be required since the underlying data format can change
significantly. This may or may not affect behavior in the applications layer, depending largely on
whether the data manipulation language or protocol has changed - but modern applications are
written to be agnostic to the database technology so that a change from Oracle to MySQL, DB2
or SQL Server should only require a testing cycle to be confident that both functional and non-
functional performance has not been adversely affected. Carefully examine the database and the
data. Determine what you are trying to accomplish with the database. Only then can you plan
what the new database environment should look like. Database migration is necessary when a
company upgrades its database or system software, either from one version to another or from
one program to an entirely different program. Software can be specifically written, using entire
programs or just scripts, to facilitate data migration. Such upgrades or program switches can take
74
place as a result of regular company practices or as a result of directives mandated in the wake of
a company takeover.
For many companies, migrating from Windows to Linux makes sense. The reasons are
compelling: greater stability and reliability, lower cost, access to application source code, greater
security, and conformity with open standards, according to numerous independent studies and
industry experts. But for those charged with spearheading the actual migration, the project may
seem difficult, unwieldy, or untenable. They may fear unanticipated technological roadblocks as
well as employees reticent to give up the tools they are used to using. Before launching into a
migration, make sure your reasons for migrating are business-driven. "You'll fail if you start a
migration from the ideological side instead," notes Stefan Werden, leader of the Linux
architecture team for Novell SUSE for Europe, the Middle East, and Africa. To make that
decision, conduct a thorough cost-benefit analysis, projected out several years, based on what it
would take to migrate and what it would cost not to migrate--not only financial costs, but other
costs, such as the amount of overall control you want over your organization's software
requirements. With a Windows-to-Linux migration, "nobody can force you to upgrade or deny
you a security fix. You always call the shots, so you are guaranteed to save money and retain
more control in the long run," says Nicholas Petreley, a Linux analyst at IT market intelligence
firm Evans Data Corp. of Santa Cruz, Calif. Of course, reducing costs can be another driving
factor in the decision-making process. According to Forrester Research, 68% of companies in
North America consider lowering overall operating costs a top priority, and Linux is one means
to that end. To achieve success in a Windows to Linux desktop migration, experts advise
planning, planning, and more planning. Before starting, take these STEPS:
GET EXECUTIVE BUY-IN- This step is crucial. Without executive support, your project may
be doomed to failure. By gathering this support before launching your project, you can be sure
will be supported, both financially and otherwise.
75
EXAMINE EXISTING ENVIRONMENT-Also; consider whether it makes sense to invest in
new hardware, or whether you will be migrating with existing hardware.
CHOOSE LINUX VERSION - It doesn't matter much whether you settle on Red Hat, SUSE,
Mandrake, Gentoo, Debian, or another flavor altogether. It's more important that your migration
team is comfortable with the Linux distribution chosen.
IDENTIFY THE EXPERTS -Often, these include transactional workers and developers –
employees who rely on basic applications like Web browsers, productivity suites, and email –
and not workers who routinely use powerful programs like Photoshop or Macintosh-based
applications. "It's about identifying needs or pockets where you can save the greatest amount of
money with minimal disruption," notes Susan Jeffries, marketing manager for the Java Desktop
System at Sun Microsystems Inc.
CONSIDER USING OUTSIDE CONSULTANTS- Unless you or members of your team have
attempted such a migration before, it can be well worth the money to hire an outside consultant.
Doing so can save countless hours of frustration. "When you run into a problem like not being
able to get something to print, someone familiar with Linux will be able to fix it in 10 minutes. If
you don't have such a person, it could take a week to fix the problem," Petreley says. Consultants
charge in various ways--by the hour, or by the project. A reasonable hourly price range for
general consulting with an experienced systems engineer costs $125 to $225 per hour. Total cost
depends on geographic location, the cost of consultants in the area, and the number of
employees, environment and number of applications to migrate. Typically, a migration
consulting service may begin with a workshop, proof of concept and architecture and
implementation review.
Here are some tips you can use to enhance your Windows to Linux migration experience,
along with solutions to some daunting challenges.
76
CHOOSING A LINUX DISTRIBUTION
When it comes to choosing a Linux distribution, there are a dizzying array of choices, including
de facto standard Red Hat, Mandrake, SUSE, Mandrake, a Debian variant, or Fedora, a Red Hat-
and community-supported open source project to build a complete, general purpose operating
system exclusively from free software. Businesses usually can't lose by going with a de facto
standard, which today is SUSE, Red Hat, or Fedora. SUSE is backed by Novell, while the Java
Desktop System is supported by Sun. Both are solid companies, and both distributions also are
solid choices. SUSE can make a great deal of sense for companies used to managing their
desktop environment with systems management tools like ZenWorks and SMS. But for
businesses that have deployed Red Hat servers and have experience with the Red Hat network,
extending their IT infrastructure with Red Hat desktops also can make good sense.
Unlike Windows, Linux supports GNOME and KDE, as well as a wide variety of window
managers. Red Hat makes GNOME and KDE look and work alike. While GNOME has the
fewest features and is least likely to confuse users with options, KDE is more feature-rich and
powerful, but is so configurable that it could some to spend too much time working with it.
DATABASE MIGRATION
Carefully examine the database and the data. Determine what you are trying to accomplish with
the database. Only then can you plan what the new database environment should look like.
Microsoft Access databases can be ported fairly easily to MySQL Database Server and
PostgreSQL. In situations where migration problems occur, some enterprising vendors are
creating solutions, such as Versora's ProgressionDB, which deals with some of the most common
conversion issues. Another option is Shareplex from Quest Software Inc. of Irvine, Calif. This
77
data replication software, mostly used to migrate Oracle to Linux, provides support for thousands
of rows per second for load distribution, disaster recovery, and migrations.
Oracle Recovery Manager satisfies the most pressing demands of per formant, manageable
backup and recovery, for all Oracle data formats. A complete high availability and disaster
recovery strategy requires dependable data backup, restore, and recovery procedures. Oracle
Recovery Manager (RMAN) provides a comprehensive foundation for efficiently backing up and
recovering the Oracle database. It is designed to work intimately with the server, providing
block-level corruption detection during backup and restore. RMAN optimizes performance and
space consumption during backup with file multiplexing and backup set compression, and
integrates with Oracle Secure Backup, as well as third party media management products, for
tape backup. RMAN takes care of all underlying database procedures before and after backup or
restore, freeing dependency on OS and SQL*Plus scripts. It provides a common interface, via
command line and Enterprise Manager, for backup tasks across different host operating systems
and offers features not available through user-managed methods, such as parallelization of
backup/restore data streams, backup files retention policy, and detailed history of all backups.
This post describes the procedure required to migrate a database from Windows to Linux using
the RMAN Convert Database command. Both Windows and Linux platforms have the same
endian format, which makes possible to transfer the whole database, making the migration
process very straightforward and simple. To migrate between platforms that have a different
endian format, Cross Platform Transportable Table spaces (XTTS) needs to be used instead.
78
4.5 LIST OF STEPS NEEDED TO COMPLETE THE MIGRATION
6. Copy converted datafiles, generated Transport Script and Parameter File to Linux
8. Edit the Transport Script and Parameter File changing the windows paths to Linux Paths
desc V$db_transportable_platform
Step 1-->shutdown database
Step 2-->startup mount
Alter database open read only
79
rman> convert database new database vikas1
transport script 'E:\trans'
to paltform 'Linux IA(32-bit)';
//this scripr generate file for every datafile in database folder and parameter file and
a transport script in e drive copy these files and paste on linux machine.
HARDWARE REQUIREMENT
SOFTWARE REQUIREMENT
OPERATING SYSTEM— Windows 98, 2000, NT, XP. (32-bit), LINUX red hat 5.
DATABASE -- Oracle 9i, 10g
TOOL—Oracle Recovery Manager
FRONT END—SQL Navigator
BACK END – Oracle 10g
80
CHAPTER-5: DATA DESIGN
81
DATA DESIGN
In the course of developing a set of leveled data-flow diagrams the analyst/designers is forced to
address how the system may be decomposed into component sub-systems, and to identify the
transaction data in the data model. There are different notations to draw data-flow diagrams,
defining different visual representations for Processes, Data Stores, Data Flow, And External
82
Entities. Data flow diagram ("bubble charts") are directed graphs in which the nodes specify
processing activities and the arcs specify data items transmitted between processing nodes.
TOP-DOWN APPROACH
1. The system designer makes "a context level DFD" or Level 0, which shows the
"interaction" (data flows) between "the system" (represented by one process) and "the
system environment" (represented by terminators).
2. The system is "decomposed in lower-level DFD (Level 1)" into a set of "processes, data
stores, and the data flows between these processes and data stores".
83
3. Each process is then decomposed into an "even-lower-level diagram containing its sub
processes".
4. This approach "then continues on the subsequent sub processes", until a necessary and
sufficient level of detail is reached which is called the primitive process (aka chewable in
one bite).
DFD is also a virtually designable diagram that technically or diagrammatically describes the
inflow and outflow of data or information that is provided by the external entity.
This level shows the overall context of the system and its operating environment and shows the
whole system as just one process. It does not usually show data stores, unless they are "owned"
by external systems, e.g. are accessed by but not maintained by this system, however, these are
often shown as external entities.
84
LEVEL 1 (HIGH LEVEL DIAGRAM)
A Level 1 Data flow diagram for the same system. This level (level 1) shows all processes at the
first level of numbering, data stores, external entities and the data flows between them. The
purpose of this level is to show the major high-level processes of the system and their
interrelation. A process model will have one, and only one, level-1 diagram. A level-1 diagram
must be balanced with its parent context level diagram, i.e. there must be the same external
entities and the same data flows, these can be broken down to more detail in the level 1, example
the "inquiry" data flow could be split into "inquiry request" and "inquiry results" and still be
valid.
85
LEVEL 2 (LOW LEVEL DIAGRAM)
A Level 2 Data flow diagram showing the "Process Enquiry" process for the same system. This
level is decomposition of a process shown in a level-1 diagram, as such there should be a level-2
diagram for each and every process shown in a level-1 diagram. In this example processes 1.1,
1.2 & 1.3 are all children of process 1, together they wholly and completely describe process 1,
and combined must perform the full capacity of this parent process. As before, a level-2 diagram
must be balanced with its parent level-1 diagram test.
86
database, the conceptual data model is, at a later stage (usually called logical design), mapped to
a logical data model, such as the relational model; this in turn is mapped to a physical model
during physical design. Note that sometimes, both of these phases are referred to as "physical
design”. There are a number of conventions for entity-relationship diagrams (ERDs). The
classical notation mainly relates to conceptual modeling. There are a range of notations
employed in logical and physical database design, such as IDEF1X
87
as a customer transaction or order. Although the term entity is the one most commonly used,
following Chen we should really distinguish between an entity and an entity-type. An entity-type
is a category. An entity, strictly speaking, is an instance of a given entity-type. There are usually
many instances of an entity-type. Because the term entity-type is somewhat cumbersome, most
people tend to use the term entity as a synonym for this term. A relationship captures how two or
more entities are related to one another. Relationships can be thought of as verbs, linking two or
more nouns. Examples: an owners relationship between a company and a computer, a supervises
relationship between an employee and a department, a performs relationship between an artist
and a song, a proved relationship between a mathematician and a theorem.
DIAGRAMMING CONVENTIONS
Entity sets are drawn as rectangles, relationship sets as diamonds. If an entity set participates in a
relationship set, they are connected with a line. Attributes are drawn as ovals and are connected
with a line to exactly one entity or relationship set. Cardinality constraints are expressed as
follows:
a double line indicates a participation constraint, totality or surjectivity: all entities in the
entity set must participate in at least one relationship in the relationship set;
an arrow from entity set to relationship set indicates a key constraint, i.e. infectivity: each
entity of the entity set can participate in at most one relationship in the relationship set;
A thick line indicates both, i.e. objectivity: each entity in the entity set is involved in
exactly one relationship.
An underlined name of an attribute indicates that it is a key: two different entities or
relationships with this attribute always have different values for this attribute.
Attributes are often omitted as they can clutter up a diagram; other diagram techniques often list
entity attributes within the rectangles drawn for entity sets
88
CHAPTER-6: CODING
89
The Windows Database
C:\ORACLE\PRODUCT\10.2.0\FLASH_RECOVERY_AREA\ORCLW\ONLINELOG\O1_MF
_3_44QYBC2F_.LOG
90
1. Check platform compatibility between sources and target OS
You need to check the platforms to be sure they have the same endian format; also you need to
save the platform_ name string to use it later as part of the convert database
In order to execute dbms_tdb.check_db the database must be opened on read only mode.
91
Database Buffers 88080384 bytes
Redo Buffers 7090176 bytes
Database mounted.
SQL> alter database open read only;
Database altered.
If the execution of dbms_tdb.check_db does not return any exceptions, that means the database is
ready for transport to the target platform.
If there is any external objects take note of them, they will need to be taken care manually
92
6 * on console */
7 external := dbms_tdb.check_external;
8 end;
9/
The following directories exist in the database:
SYS.DATA_PUMP_DIR, SYS.ORACLE_OCM_CONFIG_DIR, SYS.ADMIN_DIR,
SYS.WORK_DIR
PL/SQL procedure successfully completed.
Having executed successfully the checkup steps, the database is open in read only
mode, then the convert database command can be executed with Rman.
In this example, I’m not using the db_file_name_convert '<source-path> '<target-
path>' option because the database is using oracle managed files (omf); when using
omf Rman does generate and display the new filenames on the output of the convert
database command. At the end of the convert process Rman does display information
about how to complete the conversion on the target platform.
93
Directory SYS.DATA_PUMP_DIR found in the database
Directory SYS.ORACLE_OCM_CONFIG_DIR found in the database
94
Run SQL script G:\TECHNOLOGY\ORCLLNX\TRANSPORTSCRIPT on the target platform
to create
database
Edit init.ora file
C:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\INIT_00JIGSKL_1_0.ORA. This
PFILE will be used to create the database on the target platform
To recompile all PL/SQL modules, run utlirp.sql and utlrp.sql on the target platform
To change the internal database identifier, use DBNEWID Utility
Finished backup at 19-MAY-11
We need to copy over the generated files to the Linux server, they include all converted
datafiles, the transportscript and the generated pfile.If needed create at this point the
directories you will need on the Linux server, for dump destination and flash
recovery area i.e.
mkdir –p /oradisk/oracle/app/admin/ORCLLNX/adump
mkdir –p /oradisk/oracle/app/admin/ORCLLNX/bdump
mkdir –p /oradisk/oracle/app/admin/ORCLLNX/cdump
mkdir –p /oradisk/oracle/app/admin/ORCLLNX/udump
mkdir –p /oradisk/database/ORCLLNX/FRA
O1_MF_SYSTEM_44TM3OPF_.DBF, then,
95
G:\TECHNOLOGY\ORCLW\DATAFILE>move
C:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\INIT_00JIGSKL_1_0.ORA
..\..\ORCLLNX\initORCLLNX.ora
Check that all required files are located on the ORCLLNX directory
G:\TECHNOLOGY\ORCLW\DATAFILE>dir ..\..\ORCLLNX
Volume in drive G is TECHNOLOGY
Volume Serial Number is 1877-B4EA
Directory of G:\TECHNOLOGY\ORCLLNX
The Rman convert command executed on Windows generated a parameter file that
needs to be edited to be usedon the target Linux Server.
96
The pfile generated by Rman:
sga_target = 167772160
db_block_size = 8192
compatible = "10.2.0.3.0"
db_file_multiblock_read_count= 16
undo_management = "AUTO"
97
undo_tablespace = "UNDOTBS1"
job_queue_processes = 10
open_cursors = 300
pga_aggregate_target = 16777216
98
db_block_size = 8192
compatible = '10.2.0.3.0'
db_file_multiblock_read_count= 16
undo_management = 'AUTO'
undo_tablespace = 'UNDOTBS1'
job_queue_processes = 10
open_cursors = 300
pga_aggregate_target = 16777216
Before running the transport script on the target Linux server we need to edit it to set the
correct paths for datafiles,controlfiles and dump directories, also we may want to
change the value for tuning parameters.The script generated by Rman:
-- The following commands will create a new control file and use it
-- to open the database.
-- Data used by Recovery Manager will be lost.
-- The contents of online logs will be lost and all backups will
-- be invalidated. Use this only if online logs are damaged.
-- After mounting the created controlfile, the following SQL
-- statement will place the database in the appropriate
-- protection mode:
-- ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PERFORMANCE
STARTUP NOMOUNT
PFILE='C:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\INIT_00JIGSKL_1_0.ORA'
CREATE CONTROLFILE REUSE SET DATABASE "ORCLLNX" RESETLOGS
NOARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
99
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 292
LOGFILE
GROUP 1 SIZE 50M,
GROUP 2 SIZE 50M,
GROUP 3 SIZE 50M
DATAFILE
'C: \ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\DATA_D-ORCLW_I-
1718464921_TSSYSTEM_
FNO-1_07JIGSKL',
'C: \ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\DATA_D-ORCLW_I-
1718464921_TSUNDOTBS1_
FNO-2_08JIGSMD',
'C: \ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\DATA_D-ORCLW_I-
1718464921_TSSYSAUX_
FNO-3_09JIGSNG',
'C:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\DATA_D-ORCLW_I-
1718464921_TSUSERS_
FNO-4_0AJIGSOA'
CHARACTER SET AL32UTF8
;
-- Database can now be opened zeroing the online logs.
ALTER DATABASE OPEN RESETLOGS;
-- Commands to add tempfiles to temporary tablespaces.
-- Online tempfiles have complete space information.
-- Other tempfiles may require adjustment.
ALTER TABLESPACE TEMP ADD TEMPFILE
SIZE 20971520 AUTOEXTEND ON NEXT 655360 MAXSIZE 33554431M;
-- End of tempfile additions.
--
100
set echo off
prompt
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
prompt * Your database has been created successfully!
prompt * There are many things to think about for the new database. Here
prompt * is a checklist to help you stay on track:
prompt * 1. You may want to redefine the location of the directory objects.
prompt * 2. You may want to change the internal database identifier (DBID)
prompt * or the global database name for this database. Use the
prompt * NEWDBID Utility (nid).
prompt
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SHUTDOWN IMMEDIATE
STARTUP UPGRADE
PFILE='C:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\INIT_00JIGSKL_1_0.ORA'
@@?/rdbms/admin/utlirp.sql
SHUTDOWN IMMEDIATE
STARTUP
PFILE='C:\ORACLE\PRODUCT\10.2.0\DB_1\DATABASE\INIT_00JIGSKL_1_0.ORA'
-- The following step will recompile all PL/SQL modules.
-- It may take serveral hours to complete.
@@ ?/rdbms/admin/utlrp.sql
set feedback 6;
The script edited to be run on Linux:
STARTUP NOMOUNT
PFILE='/oradisk/oracle/app/oracle/product/10.2.0/db_1/dbs/initORCLLNX.ORA'
CREATE CONTROLFILE REUSE SET DATABASE "ORCLLNX" RESETLOGS
NOARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
101
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 292
LOGFILE
GROUP 1 SIZE 50M,
GROUP 2 SIZE 50M,
GROUP 3 SIZE 50M
DATAFILE
'/oradisk/database/ORCLLNX/O1_MF_SYSTEM_44TM3OPF_.DBF',
'/oradisk/database/ORCLLNX/O1_MF_UNDOTBS1_44TM5F98_.DBF',
'/oradisk/database/ORCLLNX/O1_MF_SYSAUX_44TM6JTB_.DBF',
'/oradisk/database/ORCLLNX/O1_MF_USERS_44TM7BD5_.DBF'
CHARACTER SET AL32UTF8
;
-- Database can now be opened zeroing the online logs.
ALTER DATABASE OPEN RESETLOGS;
-- Commands to add tempfiles to temporary tablespaces.
-- Online tempfiles have complete space information.
-- Other tempfiles may require adjustment.
ALTER TABLESPACE TEMP ADD TEMPFILE
SIZE 20971520 AUTOEXTEND ON NEXT 655360 MAXSIZE 33554431M;
-- End of tempfile additions.
--
set echo off
prompt
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
prompt * Your database has been created successfully!
prompt * There are many things to think about for the new database. Here
prompt * is a checklist to help you stay on track:
prompt * 1. You may want to redefine the location of the directory objects.
prompt * 2. You may want to change the internal database identifier (DBID)
102
prompt * or the global database name for this database. Use the
prompt * NEWDBID Utility (nid).
prompt
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SHUTDOWN IMMEDIATE
STARTUP UPGRADE
PFILE='/oradisk/oracle/app/oracle/product/10.2.0/db_1/dbs/initORCLLNX.ORA'
@@?/rdbms/admin/utlirp.sql
SHUTDOWN IMMEDIATE
STARTUP
PFILE='/oradisk/oracle/app/oracle/product/10.2.0/db_1/dbs/initORCLLNX.ORA';
-- The following step will recompile all PL/SQL modules.
-- It may take several hours to complete.
@@?/rdbms/admin/utlrp.sql
Set feedback 6;
103
Fixed Size 1266368 bytes
Variable Size 71306560 bytes
Database Buffers 88080384 bytes
Redo Buffers 7118848 bytes
SQL> CREATE CONTROLFILE REUSE SET DATABASE "ORCLLNX" RESETLOGS
NOARCHIVELOG
2 MAXLOGFILES 16
3 MAXLOGMEMBERS 3
4 MAXDATAFILES 100
5 MAXINSTANCES 8
6 MAXLOGHISTORY 292
7 LOGFILE
8 GROUP 1 SIZE 50M,
9 GROUP 2 SIZE 50M,
10 GROUP 3 SIZE 50M
11 DATAFILE
12 '/oradisk/database/ORCLLNX/O1_MF_SYSTEM_44TM3OPF_.DBF',
13 '/oradisk/database/ORCLLNX/O1_MF_UNDOTBS1_44TM5F98_.DBF',
14 '/oradisk/database/ORCLLNX/O1_MF_SYSAUX_44TM6JTB_.DBF',
15 '/oradisk/database/ORCLLNX/O1_MF_USERS_44TM7BD5_.DBF'
16 CHARACTER SET AL32UTF8
17 ;
Control file created.
SQL>
SQL> -- Database can now be opened zeroing the online logs.
SQL> ALTER DATABASE OPEN RESETLOGS;
Database altered.
SQL>
SQL> -- Commands to add tempfiles to temporary tablespaces.
SQL> -- Online tempfiles have complete space information.
SQL> -- Other tempfiles may require adjustment.
104
SQL> ALTER TABLESPACE TEMP ADD TEMPFILE
2 SIZE 20971520 AUTOEXTEND ON NEXT 655360 MAXSIZE 33554431M;
Tablespace altered.
SQL> -- End of tempfile additions.
SQL> --
SQL>
SQL> set echo off
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
* Your database has been created successfully!
* There are many things to think about for the new database. Here
* is a checklist to help you stay on track:
* 1. You may want to redefine the location of the directory objects.
* 2. You may want to change the internal database identifier (DBID)
* or the global database name for this database. Use the
* NEWDBID Utility (nid).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Database closed.
Database dismounted.
ORACLE instance shut down.
ORACLE instance started.
Total System Global Area 167772160 bytes
Fixed Size 1266368 bytes
Variable Size 71306560 bytes
Database Buffers 88080384 bytes
Redo Buffers 7118848 bytes
Database mounted.
Database opened.
SQL> WHENEVER SQLERROR EXIT;
SQL>
SQL> DOC
DOC>#######################################################################
105
DOC>#######################################################################
DOC> The following statement will cause an "ORA-01722: invalid number"
DOC> error if there the database was not opened in UPGRADE mode
DOC>
DOC> If you encounter this error, execute "SHUTDOWN", "STARTUP UPGRADE" and
DOC> re-execute utlirp.sql
DOC>#######################################################################
DOC>#######################################################################
DOC>#
SQL> SELECT TO_NUMBER ('MUST_BE_OPEN_UPGRADE') FROM v$instance
2 WHERE status! = 'OPEN MIGRATES';
no rows selected
SQL> Rem Store object numbers of all valid PL/SQL-based functional indexes
SQL> DROP TABLE utlirp_enabled_func_indexes;
Table dropped.
SQL> CREATE TABLE utlirp_enabled_func_indexes AS
2 SELECT obj# FROM ind$
3 WHERE bitand(property, 2048) != 0 AND bitand(flags, 1024) = 0;
Table created.
>>>>>>>>>>>>>> many output lines followed
106
DOC>#
SQL> select COUNT (*) "OBJECTS WITH ERRORS" from obj$ where status = 3;
OBJECTS WITH ERRORS
-------------------
0
SQL>
SQL>
SQL> DOC
DOC> the following query reports the number of errors caught during
DOC> recompilation. If this number is non-zero, please query the error
DOC> messages in the table UTL_RECOMP_ERRORS to see if any of these errors
DOC> are due to misconfiguration or resource constraints that must be
DOC> fixed before objects can compile successfully.
DOC>#
SQL> select COUNT (*) "ERRORS DURING RECOMPILATION" from utl_recomp_errors;
ERRORS DURING RECOMPILATION
---------------------------
0
SQL>
SQL>
SQL> Rem
=====================================================================
SQL> Rem Run component validation procedure
SQL> Rem
=====================================================================
SQL>
SQL> SET serveroutput on
SQL> EXECUTE dbms_registry_sys.validate_components;
PL/SQL procedure successfully completed.
SQL> SET serveroutput off
SQL>
107
SQL>
SQL> Rem
=====================================================================
======
SQL> Rem END utlrp.sql
SQL> Rem
=====================================================================
======
SQL> set feedback 6;
To change the database identifier you need to use the NEWDBID utility “nid”. It is run
from within Sqlplus havingthe database mounted:
108
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.
109
Datafile /oradisk/database/ORCLLNX/O1_MF_USERS_44TM7BD5_.DBF - dbid changed
Datafile /oradisk/database/ORCLLNX/ORCLLNX/datafile/o1_mf_temp_44yxofkr_.tmp
- dbid changed
Control File /oradisk/database/ORCLLNX/orcllnx_control1.ctl - dbid changed
Control File /oradisk/database/ORCLLNX/orcllnx_control2.ctl - dbid changed
Instance shut down
Database ID for database ORCLLNX changed to 1179074095.
All previous backups and archived redo logs for this database are unusable.
Database is not aware of previous backups and archived logs in Recovery Area.
Database has been shutdown, open database with RESETLOGS option.
Successfully changed database ID.
DBNEWID - Completed succesfully.
avargas-pc:~/app/oracle/product/10.2.0/db_1/bin> sqlplus / as sysdba
SQL*Plus: Release 10.2.0.4.0 - Production on Wed Jun 11 10:28:22 2008
Copyright (c) 1982, 2007, Oracle. All Rights Reserved.
Connected to an idle instance.
SQL> startup mount;
ORACLE instance started.
Total System Global Area 167772160 bytes
Fixed Size 1266368 bytes
Variable Size 71306560 bytes
Database Buffers 88080384 bytes
Redo Buffers 7118848 bytes
Database mounted.
SQL> alter database open resetlogs;
Database altered
110
------------------------------
SYSTEM
UNDOTBS1
SYSAUX
TEMP
USERS
SQL> select file_name from dba_data_files;
FILE_NAME
-------------------------------------------------------------------------------
/oradisk/database/ORCLLNX/O1_MF_USERS_44TM7BD5_.DBF
/oradisk/database/ORCLLNX/O1_MF_SYSAUX_44TM6JTB_.DBF
/oradisk/database/ORCLLNX/O1_MF_UNDOTBS1_44TM5F98_.DBF
/oradisk/database/ORCLLNX/O1_MF_SYSTEM_44TM3OPF_.DBF
111
CHAPTER-7: DATABASE MIGRATION
TESTING
112
DATABASE MIGRATION TESTING
The approach to testing data and content migrations relies upon sampling, where some subset of
random data or content is selected and inspected to ensure the migration was completed “as
designed”. Those that have tested migrations using this approach are familiar with the typical
iterative test, debug and retest method, where subsequent executions of the testing process reveal
different error conditions as new samples are reviewed. Sampling works, but is reliant upon an
acceptable level of error and an assumption pertaining to repeatability. An acceptable level of
error implies that less than 100% of the data will be migrated without error and the level of error
is inversely proportionate to the number of samples tested(refer to sampling standards such as
ANSI/ASQ Z1.4). As per the assumption on repeatability, the fact that many migrations require
four, five or more iterations of testing with differing results implies that one of the key tenets of
sampling is not upheld, i.e., “non-conformities occur randomly and with statistical independence.
Even with these shortcomings, sampling has a role in a well defined testing strategy, but what are
the other testing options. The following lists options for testing by the phase of the migration
process:
These tests occur early in the migration process, before any migration, even migration for testing
purposes, is completed. The pre-migration testing options include:
113
Verify scope of source systems and data with user community and IT. Verification should
include data to be included as well as excluded and, if applicable, tied to the specific
queries being used for the migration.
Define the source to target high-level mappings for each category of data or content and
verify that the desired type has been defined in the destination system.
Verify destination system data requirements such as the field names, field type,
mandatory fields, valid value lists and other field-level validation checks.
Using the source to destination mappings, test the source data against the requirements of
the destination system. For example, if the destination system has a mandatory field,
ensure that the appropriate source is not null, or if the destination system field has a list of
valid values, test to ensure that the appropriate source fields contain these valid values.
Test the fields that uniquely link source and target records and ensure that there is a
definitive mapping between the record sets
Test source and target system connections from the migration platform.
Test tool configuration against the migration specification which can often be completed
via black box testing on a field –by- field basis. If clever, testing here can also be used to
verify that a migration specification’s mappings are complete and accurate.
Conduct a formal design review of the migration specification when the pre-migration testing in
near complete, or during the earliest stages of the migration tool configuration.
114
Identification of supplementary sources
Data cleansing requirements
Performance requirements
Testing requirements
The formal design review should include representatives from the appropriate user communities,
IT and management. The outcome of a formal design review should include a list of open issues,
the means to close each issue and approve the migration specification and a process to maintain
the specification in sync with the migration tool configuration (which seems to continuously
change until the production migration).
Once a migration has been executed, additional end to end testing can be executed. Expect a
significant sum of errors to be identified during the initial test runs although it will be minimized
if sufficient pre-migration testing is well executed. Post-migration is typically performed in a test
environment and includes:
Test the throughput of the migration process (number of records per unit time). This
testing will be used to verify that the planned downtime is sufficient. For planning
purposes, consider the time to verify that the migration process was completed
successfully.
Compare Migrated Records to Records Generated by the Destination System – Ensure
that migrated records are complete and of the appropriate context.
Summary Verification – There are several techniques that provide summary information
including record counts and checksums. Here, the number of records migrated is
compiled from the destination system and then compared to the number of records
migrated. This approach provides only summary information and if any issue exists, it
does not often provide insight to an issue’s root cause.
Compare Migrated Records to Sources – Tests should verify that fields’ values are
migrated as per the migration specification. In short, source values and the field level
115
mappings are used to calculate the expected results at the destination. This testing can be
completed using sampling if appropriate or if the migration includes data that poses
significant business or compliance risk, 100% of the migrated data can be verified using
an automated testing tool.
The advantages of the automated approach include the ability to identify errors that are less
likely to occur (the proverbial needles in a haystack). Additionally, as an automated testing tool
can be configured in parallel with the configuration of the migration tool, the ability to test 100%
of the migrated data is available immediately following the first test migration. When compared
to sampling approaches, it is easy to see that automated testing saves significant time and
minimizes the typical iterative test, debug and retest found with sampling.
Functional subtleties related to the co-mingling of migrated data and data created in the
destination system may be difficult to identify early in the migration process. User acceptance
testing provides an opportunity for the user community to interact with legacy data in the
destination system prior to production release, and most often, this is the first such opportunity
for the users. Attention should be given to reporting, downstream feeds, and other system
processes that rely on migrated data.
116
CHAPTER-8: SCREENSHOTS
117
118
119
120
121
122
123
CHAPTER-9: CONCLUSION OF PROJECT
124
CONCLUSION
Database migration is a complex project that has to be thoroughly analyzed. Determine the
migration type and the necessary tools and resources to complete it. Various approaches have to
be fully reviewed and understood prior to choosing one. Migration can be done in-house or
outsourced to a consulting practice. Consider using third-party solutions to prevent hand-coding
migration scripts that will be time consuming and rather complex if data cleansing is involved. In
addition, these solutions can reduce or eliminate the downtime and facilitate the replication
between the systems. Outsourcing will bring in the best practices and appropriate tools to
complete the project. This project over database migration is a value-add tool that enables you to
identify any issues with your database migration before the project is signed off as complete. The
combination of Oracle Recovery manager, Oracle JDeveloper Application Migration Assistant,
and Oracle Database Migration Verifier provides a comprehensive solution for database and
application migration to the Oracle platform.
125
Appendix 1
INDEX
A R
C T
Command Notation 78 Three-Level Architecture 17
Coding 88 Task of DBMS Package 30
Choosing Linux 76 Tables 7
D W
Data migration 71 Window Manager 76
Database migration 73
DBMS Features 30
Data Flow Diagram 81
E
External database 12
External data level 18
K
Key 7
126
Appendix 2
GLOSSARY
A
Attribute - A single data item related to a database object
C
Candidate Key - A candidate key is a combination of attributes that can be uniquely used to
identify a database record without any extraneous data.
Column- Database tables are composed of individual columns corresponding to the attributes of
the object.
D
Data - Data consists of a series of facts or statements that may have been collected, stored,
processed and/or manipulated but have not been organized or placed into context.
DB2- DB2 is a relational database system developed by IBM Corporation, originally for use on
large mainframe computer systems.
E
ER Diagram- An entity-relationship (ER) diagram is a specialized graphic that illustrates the
interrelationships between entities in a database.
K
Key - A database key is a attribute utilized to sort and/or identify data in some manner.
R
Report- A database report presents information retrieved from a table or query in a preformatted,
attractive manner.
S
Schema- A database schema is a collection of meta-data that describes the relations in a database
127
REFERENCES
[1]. Van der Hoeven, Jeffery, Brian Lohman, and Remco Verdegem. "Emulation for Digital
Preservation in Practice: The Results." The International Journal”Curation 2.2 :
{121-132}.
[2]. Muira, Gregory. "Pushing the Boundaries of Traditional Heritage Policy: maintaining
long-term access to multimedia content." IFLA Journal 33: 323-326.
[3]. Migrating Legacy Systems, by Brodie and Stonebraker
[4]. Database migration – “A planned approach” by Steve Callan
LINKS
128