CT 3
CT 3
CT 3
(Page 238)
Basic terminologies
• Collect data
• Process data
• Store data
A computer system organizes data in a hierarchy that starts with bits and bytes and progresses
to fields, records, files, and databases.
1) Bit: A bit represents the smallest unit of data a computer can handle.
2) Byte: A group of bits, called a byte, represents a single character, which can be a
letter, a number, or another symbol.
3) Field: A grouping of characters into a word, a group of words, or a complete number
(such as a person’s name or age) is called a field.
4) Record (Tuple): A group of related fields, such as the student’s name, the course
taken, the date, and the grade, comprises a record.
A record describes an entity. An entity is a person, place, thing, or event on which we
store and maintain information. Each characteristic or quality describing a particular
entity is called an attribute. For example, Student ID, Course, Date, and Grade are
attributes of the entity COURSE. The specific values that these attributes can have are
found in the fields of the record describing the entity COURSE.
5) File: A group of records of the same type is called a file.
For example, the records in Figure 6.1 could constitute a student course file.
6) Database: A group/collection of related files is called a database.
Most databases contain multiple tables, which may each include several
different fields.
A company database may include tables for products, employees, and
financial records
The student course file illustrated in Figure 6. 1 could be grouped with files on
students’ personal histories and financial backgrounds to create a student database.
What are the major capabilities of database management systems (DBMS), and why is a
How a DBMS Solves the Problems of the Traditional File Environment/ The Database
Approach to Data Management
A single human resources database provides many different views of data, depending on the
information requirements of the user. Illustrated here are two possible views, one of interest
to a benefits specialist and one of interest to a member of the company’s payroll department.
Relational DBMS Model
The most popular type of DBMS today for PCs as well as for larger computers and
mainframes is the relational DBMS.
Relational databases represent data as two-dimensional tables (called relations).
Traditional file management techniques make it difficult for organizations to keep track
of all of the pieces of data they use in a systematic way and to organize these data so that
they can be easily accessed. Different functional areas and groups were allowed to
develop their own files independently. Over time, this traditional file management
environment creates problems such as data redundancy and inconsistency, program-data
dependence, inflexibility, poor security, and lack of data sharing and availability. A
database management system (DBMS) solves these problems with software that permits
centralization of data and data management so that businesses have a single consistent
source for all their data needs. Using a DBMS minimizes redundant and inconsistent files.
6- 2 What are the major capabilities of DBMS, and why is a relational DBMS so
powerful?
The relational database has been the primary method for organizing and maintaining data
in information systems because it is so flexible and accessible. It organizes data in two-
dimensional tables called relations with rows and columns. Each table contains data about
an entity and its attributes. Each row represents a record, and each column represents an
attribute or field. Each table also containsa key field to uniquely identify each record for
retrieval or manipulation. Relational database tables can be combined easily to deliver
data required by users, provided that any two tables share a common data element. Non-
relational databases are becoming popular for managing types of data that can’t be
handled easily by the relational data model. Both relational and non-relational database
products are available as cloud computing services.
Designing a database requires both a logical design and a physical design. The logical
design models the database from a business perspective. The organization’s data model
should reflect its key business processes and decision-making requirements. The process
of creating small, stable, flexible, and adaptive data structures from complex groups of
data when designing a relational database is termed normalization. A well-designed
relational database will not have many-to-many relationships, and all attributes for a
specific entity will only apply to that entity. It will try to enforce referential integrity rules
to ensure that relationships between coupled tables remain consistent. An entity
relationship diagram graphically depicts the relationship between entities (tables) in a
relational database.
6- 3 What are the principal tools and technologies for accessing information from
databases to improve business performance and decision making?
Contemporary data management technology has an array of tools for obtaining useful
information from all the different types of data used by businesses today, including semi-
structured and unstructured big data in vast quantities. These capabilities include data
warehouses and data marts, Hadoop, in-memory computing, and analytical platforms.
OLAP represents relationships among data as a multidimensional structure, which can be
visualized as cubes of data and cubes within cubes of data, enabling more sophisticated
data analysis. Data mining analyzes large pools of data, including the contents of data
warehouses, to find patterns and rules that can be used to predict future behavior and
guide decision making. Text mining tools help businesses analyze large unstructured data
sets consisting of text. Web mining tools focus on analysis of useful patterns and
information from the Web, examining the structure of websites and activities of website
users as well as the contents of webpages. Conventional databases can be linked via
middleware to the web or a web interface to facilitate user access to an organization’s
internal data.
6- 4 Why are information policy, data administration, and data quality assurance
essential for managing the firm’s data resources?
In sequences, events are linked over time. We might find, for example, that if a house is
purchased, a new refrigerator will be purchased within two weeks 65 percent of the time,
and an oven will be bought within one month of the home purchase 45 percent of the
time.
Classification recognizes patterns that describe the group to which an item belongs by
examining existing items that have been classified and by inferring a set of rules. For
example, businesses such as credit card or telephone companies worry about the loss of
steady customers. Classification helps discover the characteristics of customers who are
likely to leave and can provide a model to help managers predict who those customers are
so that the managers can devise special campaigns to retain such customers.
Clustering works in a manner similar to classification when no groups have yet been
defined. A data mining tool can discover different groupings within data, such as finding
affinity groups for bank cards or partitioning a database into groups of customers based
on demographics and types of personal investments.
Data cleansing, also known as data scrubbing, consists of activities for detecting and
correcting data in a database that are incorrect, incomplete, improperly formatted, or
redundant. Data cleansing not only corrects errors but also enforces consistency among
different sets of data that originated in separate systems. Specialized data-cleansing
software is available to automatically survey data files, correct errors in the data, and
integrate the data in a consistent companywide format.