Physical Design
Physical Design
Physical Design
Definition of terms Describe the physical database design process Choosing Database Architecture
Hierarchical Database Model Network Database Model Relational Database Model Object Oriented Database Model Multidimensional Database Model
1
Purpose develop technology specs Deliverable program/data structures, technology purchases, organization redesigns
Implementation Maintenance
Purpose - translate the logical description of data into the technical specifications for storing and retrieving data Goal - create a design for storing data that will provide adequate performance and insure database integrity, security and recoverability
3
Primary goal is data processing efficiency to design physical database to minimize the time required by users to interact with information system. Aim is how to make processing of physical files and databases efficient with less attention on efficient use of space because of decreasing cost of storage space
4
relations
Decisions
Attribute Physical
data types
organizations
Indexes Query
expectations
technology used
Key Decisions in Physical Design Data type Format is chosen to minimise storage
space and maximise data integrity Physical Record - Grouping attributes from logical data model into physical records File Organizations - Arranging similarly structured records in secondary memory so that groups of records can be stored, retrieved and updated rapidly Indexes and database architectures - Selecting structures for efficient storing & connecting files to make data retrieval more efficient Query Optimization - Preparing Strategies for handling queries that will optimize performance
Figure 6-1 - Composite usage map (Pine Valley Furniture Company) (Cont.)
Data volumes
Figure 6-1 - Composite usage map (Pine Valley Furniture Company) (Cont.)
Figure 6-1 - Composite usage map (Pine Valley Furniture Company) (Cont.) Usage analysis:
140 purchased parts accessed per hour 80 quotations accessed from these 140 purchased part accesses 70 suppliers accessed from these 80 quotation accesses
10
Figure 6-1 - Composite usage map (Pine Valley Furniture Company) (Cont.) Usage analysis:
75 suppliers accessed per hour 40 quotations accessed from these 75 supplier accesses 40 purchased parts accessed from these 40 quotation accesses
11
Designing Fields
Field:
CHAR fixed-length character VARCHAR2 variable-length character (memo) LONG large number NUMBER positive/negative number DATE actual date BLOB binary large object (good for graphics, sound clips, etc.)
13
Default value assumed value if no explicit value Range control allowable value limitations (constraints or validation rules) Null value control allowing or prohibiting empty fields Referential integrity range control (and null value allowances) for foreign-key to primary-key match-ups
14
Physical Records
Physical Record: A group of fields stored in adjacent memory locations and retrieved together as a unit Page: The amount of data read or written in one I/O operation Blocking Factor: The number of physical records per page
15
Legacy Systems
6-12)
Current Technology
Data Warehouses
16
Data Model is a collection of conceptual tools for describing data, data relationships, data semantics and consistency constraints Underlying structure of the design of databases is described by a data model 3 groups of data models
Physical data models Object based logical models- Object oriented model Record based logical models- Hierarchical, Network and Relational models
17
Hierarchical Model
In hierarchical and Network model, data is represented by records and relationships among data are represented by links(pointers) Hierarchical Model is inverted tree structure Records are organised as collection of trees Record is collection of fields which contain single data value A link is an association between precisely two records HDB supports one-to-many relationship but does not support many-to-many relationship One of the oldest database architecture Used for high volume transaction processing & MIS applications. Best applied when conceptual data modes also resembles a tree.
18
A node (i.e. record) in HDB can be a child node or parent node or both at the same time Main parent node is called root node which is always dummy node Path of accessing a node (record) is always predefined and comes from root through the parent to the child In m-n relationships replication of data creates many problems in updation of data and more resources are required. Hence concept of virtual records was introduced in IBMs IMS (Information Management System)
19
Parent node Node from which outgoing links are present Child node Node where incoming links are present Root node Node where no incoming links Dependent nodes Dependent on other node for their existence Sibling nodes Nodes at same level (Records of the same file) Level Layer in hierarchy Path Way in which any node can be accessed
20
Network Model
Records in the database are organised as collection of arbitrary graphs. Network structure is an arbitrary graph structure (also called plex structure) It supports many-to-many relationships Relationship is implemented using pointers between related records from one file to other which creates overhead in storage space and maintenance time. Popular on mainframe for high volume transaction processing applications Support a wider variety of processing requirements than hierarchical systems Requires significant programming and database design knowledge
21
Relational model represents data and relationships among data by collection of tables, each of which has a number of columns with unique names and number of rows. In relational model, a table is called a relation. A row is called a tuple. A column is called attribute Domain For each attribute, there is a set of permitted values, called domain of that attribute Schema defines structure of a relation. A relation schema consists of a list of attributes and their corresponding domains. An instance of a relation is a time-varying set of tuples (contents of a relation i.e. instance may change with time as the relation is updated) Cardinality of a relation is number of tuples in it Degree of a relation is number of attributes in it
Relational Model
22
Difference between HDB and NDB HDB is an inverted tree structure, NDB is
HDB is an inverted tree structure, NDB is arbitrary graph structure HDB is suitable for hierarchical structure databases, NDB for hierarchical as well as nonhierarchical databases HDB supports one to many relationship but not many-to-many relationship. NDB supports manyto-many relationship In HDB child can be accessed through only one path. In NDB multiple paths are possible for accessing the record. NDB requires significant expertise than HDB NDB supports wider variety of processing requirements than HDB
23
Object Oriented Database In this model, attributes Model that operate on those and methods
attributes are encapsulated in structures called object classes. Class defines prototype for object i.e. object is instance of class Relationships between object classes are shown in part, by nesting or encapsulating one object class within another New object classes are defined from more general object classes (inheritance), so support reusability Objects can exhibit different behaviour in call to same method under different parameters or method belonging to different classes in hierarchy (polymorphism) Complex data types like graphics, video and sound are supported as easily as simpler data types Used in applications involving complex data or event driven programming.
24
Object Oriented Database Model The following principles apply to OOD models
Encapsulation An application (another object) can communicate with an object via messages. Operations provided by an object define the set of messages which can be understood by it; no other operations can be applied to an object. Inheritance - New object classes can be derived from another class (the super-class) by inheritance. The new classes inherit the attributes and methods of the super-class and offer additional attributes and operations. The relation between derived class and its super-class is called isA relationship because an instance of derived class also in an instance of super-class. Polymorphism Derived class may redefine methods of their super-class(es). This is very useful for achieving class-specific behaviour using messages already available for the super-class.
25
The ability to tackle more challenging problem domains Improved communication between the users, analysts, designers and programmers Increased consistency among analysis, design, and programming activities Explicit representation of commonality among system components Robustness of systems Reusability of analysis, design, and programming results Increased consistency among all the models developed during object-oriented analysis, design, and programming
26
One as a multidimensional cube with each cell containing one or more simple attributes and dimensions are ways to categorise the raw data Dimensions are the factors or categories on which users want to summarize or segment data Second view is called star schema.
At the centre is a fact table, equivalent to the cell in multidimensional view. This table contains all the raw attributes and a composite key made up of the primary keys of all the surrounding dimension tables. The surrounding dimension tables define each of the ways to categorise the data
27
Key design issue with multidimensional database model is identifying in advance the lowest common denominator of data i.e. the dimensions or categories by which users wish to summarize the raw facts. Although multidimensional database model is a special case of both the relational and network data models, special DBMs have been developed that optimize dimensional data processing.
28
Excellent for ad-hoc queries, but bad for online transaction processing
29
30
31
Transactional grain finest level Aggregated grain more summarized Finer grains better market basket analysis capability Finer grain more dimension tables, more rows in fact table
Natural duration 13 months or 5 quarters Financial institutions may need longer duration Older data is more difficult to source and cleanse
32