Data Structure-Physical and Logical Arrangement of Data in Files or Databases

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

COMPONENTS OF DATA STRUCTURE AND HOW IT IS USED TO ACHIEVE DATA

PROCESSING STRUCTURES

Data structure- physical and logical arrangement of data in files or databases.

Components of data structure

1. Organization method- the way records are arranged physically on the secondary
storage device.
2. Access method- technique used to locate records and to navigate through the
database or file

Types of Data Structure

1. Flat-file- data files are structured, formatted and arranged to suit the specific
needs of the owner or primary user.
a. Sequential structure- records are stored in a contiguous location that
occupies a specified area of dick space.
b. Indexed structure- in addition to the actual data file, there exists a
separate index that is itself a file of record addresses
- Records are dispersed throughout the disk without regard to
physical proximity to other related records.
c. Virtual storage access method (VSAM)- used for very large files that
require routine batch processing and a moderate degree of individual
record processing.
d. Hashing Structure – employs algorithm that converts the primary key of a
record directly into a storage address.
e. Pointers structure- used to create a linked-list.

2. Hierarchical and Network Database Structures- designed to support flat-file


systems already in place, while allowing the organization to move to new levels
of data integration.

1|Data Structures and CAATTs for Data Extraction


File Processing Operation Suited Access Structure Method

1. Retrieve a record from the file based on


Indexed/Hashing structure
its primary key

2. Insert record into a file Indexed/hashing/VSAM structure


Sequential/Hashing/VSAM/Indexed
3. Update a record in the file
structure

4. Read a complete file of records Sequential/Hashing structure

5. find the next record in the file Sequential structure

6. Scan a file for records with common


Indexed structure
secondary keys

7. Delete a record from a file Access structure

RELATIONAL DATABASE STRUCTURE, CONCEPTS, AND TERMINOLOGY

Relational Databases – based on the indexed sequential structure

- uses an index in conjunction with a sequential file organization

Relational Database Theory

A system is considered relational if:

1. It represents data in the form of two-dimensional tables


2. Supports the relational algebra functions of restrict, project, and join
a. Restrict - extracts specified rows from a specified table
b. Project – extracts specified attributes from a table to create a virtual table
c. Join – builds a new physical table from two tables consisting of all
concatenated pairs of rows

RELATIONAL DATABASE CONCEPTS

Entity - anything about which the organization wishes to capture data

Attributes – the data elements that define an entity

Association – labeled line connecting two entities in a data model

Cardinality – the degree of association between two entities

2|Data Structures and CAATTs for Data Extraction


- describes the number of possible occurrence in one table that are
associated with a single occurrence in a related table
Four basic Forms:
a. zero or one (0,1)

Employee Company Car

b. one and only one (1, 1)

Manager Laptop

c. zero or many (0, M)


d. one or many (1, M)

Customer Sales Order

Physical Database Tables

Physical Database tables - are constructed from the data model with each entity in the
model being transformed into a separate physical table

4 Characteristics of a Properly Designed Table

1. The value of at least one attribute in each occurrence must be unique


2. All attribute values in any column must be of the same class
3. Each column in a given table must be uniquely named
4. Tables must conform to the rules of normalization

Anomalies, Structural Dependencies, and Data Normalization

Data Anomalies

a. Update Anomaly - results from data redundancy in an unnormalized table


b. Insertion anomaly
c. Deletion Anomaly – involves the unintentional deletion of data from a table

Normalizing Tables

normalization process – involves identifying and removing structural dependencies from


the tables under review

2 conditions:

1. All nonkey attributes in the table are dependent on the primary key.
2. All nonkey attributes are independent of the other nonkey attributes.

3|Data Structures and CAATTs for Data Extraction


DESIGNING RELATIONAL DATABASES

Database Design – a component of a much larger systems development process


that involves extensive analysis of user needs

Six Phases of Database Design / View Modeling

1. Identify Entities
Conditions for Valid Entities:
a. It must consist of two or more occurrences
b. It must contribute at least one attribute that is not provided through other
entities
2. Construct a data model showing entity associations
3. Add primary keys and attributes to the model
4. Normalize the data model and add foreign keys
5. Construct the physical database
6. Prepare the user views

EMBEDDED AUDIT MODEL (EAM) / CONTINUOUS AUDITING

- is specifically programmed module embedded in a host application to capture


predetermined transaction types for subsequent analysis

Disadvantages of EAMs

1. Operational Efficiency
2. Verifying EAM integrity

GENERALIZED AUDIT SOFTWARE

 Usages include:

1) Footing and balancing entire files or selected data items (e.g., extending
inventory)

2) Selecting and reporting detail data

3) Selecting stratified statistical samples from data files

4) Formatting results into audit reports (auto work papers!)

5) Printing confirmations

6) Screening / filtering data

4|Data Structures and CAATTs for Data Extraction


7) Comparing multiple files for differences

8) Recalculating values in data

 Popular because:

1) GAS software is easy to use and requires little computer background

2) Many products are platform independent, works on mainframes and PCs

3) Auditors can perform tests independently of IT staff

4) GAS can be used to audit the data currently being stored in most file
structures and formats

 Auditing issues:

1) Auditor must sometime rely on IT personnel to produce files/data

2) Risk that data integrity is compromised by extraction procedures

3) Auditors skilled in programming better prepared to avoid these pitfalls

ACL

 is a proprietary version of GAS

 Leader in the industry

 Designed as an auditor-friendly meta-language (i.e., contains commonly used


auditor tests)

 Access to data generally easy with ODBC interface

5|Data Structures and CAATTs for Data Extraction

You might also like