DATABASE

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 53

DATABASE CSC 809 PRESENTATIONS

GROUP 4:

Client/Server System, Data Warehouse, Data Mining, Database


in E-Commerce, Web Database Development and Database
Administration.

1. Hassan Issa Ndana

2. Alausa Babatunde

3. Atanda Maruf Oladele

4. Olayiwola Folorunso Oluyomi

5. Adimula Oluwamayowa Anne

CLIENT/SERVER SYSTEM
PRESENTED BY: ISSA HASSAN NDANA UIL/PG2022/0269
INTRODUCTION
• Storing Data and the Database
– Data is information in its simplest form, meaningless until related together in some
fashion so as to become meaningful.
– All data on computer is stored in one kind of database or another.

DATABASE SYSTEM ARCHITECTURES


• Client/Server database system
• Distributed database system
• Parallel database system
• Centralized database system

CLIENT/SERVER IN RESPECT OF DATABASES


• A Database Management System (DBMS) lies at the center of most Client/Server systems in
use today.
– Provide transparent data access to multiple and heterogeneous clients.
– Process client data requests at the local server.
– Send only the SQL result to the clients over the network.

CLIENT/SERVER DATABASE COMPUTING

Query
QueryResult Client
Client running
running application
• Client/Server database computing can be defined as the logical partition of the user interface,
database management, and business; l ogi c between the client computer and server
computer.
• Business logic can be located on the server, on the client, or mixed between the two.
• Following are the reasons for its popularity.
– Affordability
– Speed
– Adaptability
– Simplified data access
CLIENT/SERVER DATABASE ARCHITECTURE
Various types of available Client/Server Database Architecture
1. Process-per-client architecture

Clients
Server
Process

Process
Database

Process

2. Multi-threaded architecture

Multi-threaded
Process

3. Hybrid architecture

Clients
Server
Process D ATABAS
Process
ProcessE

Process Process

Process

Listener Shared Shared Server


Dispatch Process Pool
Pool
DATABASE MIDDLEWARE COMPONENT

Client Front-end

API

Database Middleware
Database Translator

Network Translator
Network Protocol

 Application Programming Interface


 Database Translator
 Network Translator
 Network Translator

1) Application Programming Interface


• The application-programming interface is public to the client application.
• The middleware API allows the programmer to write generic SQL code instead of code
specific to each database server.
• The server can be changed without requiring that the client applications be completely
rewritten.
2) Database Translator
• The database translator translates the SQL requests into the specific database server
syntax.
• The database translator layer takes the generic SQL request and maps it to the database
server’s SQL protocol.
• If the SQL request uses data from two different database servers, the database translator
layer will take care of communicating with each server,
3) Network Translator
The network translator manages the network communication protocols.
• If a client application taps into the two databases, one that uses TCP/IP and another
that uses IPX/SPX, the network layer handles all the communications detail of each
database transparently to the client application.

DISTRIBUTED CLIENT/SERVER DATABASE SYSTEMS


• Distributed data refers to the basic data stored in the server, which is distributed to different
members of the work team.
• While distributed processing refers to the way different tasks are organized among members
of the work team.
• The data in the database can be partitioned in several ways and process can be centralized,
partitioned or replicated in many different ways.

DISTRIBUTED DBMS
1. The DBMS must provide distributed database transparency features like:

 Distribution transparency

 Transaction transparency

 Failure transparency

 Performance transparency

 Heterogeneity transparency

2. Interaction between client and server might proceed as follows during the processing of an
SQL query:

 The client passes a user query and decomposes it into a number of independent site
queries. Each site query is sent to the appropriate server site.

 Each server process the local query and sends the resulting relation to the client site.

 The client site combines the results of the subqueries to produce the result of the
originally submitted query.
3. In a typical DBMS, it is customary to divide the software module into three levels:

 L1: The server software is responsible for local data management at site, much like
centralized DBMS software.
 L2: The client software is responsible for most of the distributions; it access data
distribution information from the DBMS catalog and process all request that requires
access to more than one site. It also handles all user interfaces.

 L3: The communication software provides the communication primitives that are used
by the client to transmit commands and data among the various sites as needed.

WEB/DATABASE SYSTEM FOR CLIENT/SERVER APPLICATIONS


• A client machine that runs a web browser issues a request for information in the form of a
URL reference.
• This reference triggers a program at the web server that issues the correct database command
to a database server.
• The output returned to the web server is converted into a HTML format and returned to the
web browser.

TECHNOLOGIES FOR CLIENT/SERVER APPLICATION


• Rich transaction processing
• Roaming agents
• Rich data management
• Intelligent self-managing entities
• Intelligent middleware

SERVICE OF A CLIENT/SERVER APPLICATION

Database centered systems


• Decision-Support Systems (DSS)
• Online Transaction Processing (OLTP)
Groupware
• Multimedia Document Managements (MMDM)
• Workflow
• Scheduling (or Calendaring)
• Conferencing
• Electronic Mail (E-mail)

CATEGORIES OF CLIENT/SERVER APPLICATIONS

 Host-based processing

 Server-based processing

 Client-based processing

 Cooperative processing

CLIENT SERVICES
• Responsible for managing the user interface.
• Provides presentation services.
• Accepts and checks the syntax of user inputs. User input and final output, if any, are presented
at the client workstation.
• Acts as a consumer of services provided by one or more server processors.

SERVER SERVICES
• Some of the main operations that server perform are listed below:
– Accepts and processes database requests from client.
– Checks authorization.
– Ensure that integrity constraints are not violated.
– Performs query/update processing and transmits response to client.
– Maintains system catalog.
– Provide concurrent database access.
– Provides recovery control.

DATA WAREHOUSE
PRESENTED BY ; ALAUSA BABATUNDE UIL/PG2022/0096
Introduction to Data warehouse

A data warehouse is a large, centralized repository of integrated data from various sources
within an organization. A data warehouse is used to store historical and current data in a single
place, which is used for creating analytical reports and gaining business insights

Data warehouse is a type of data management system that is designed for efficient querying, repo
rting, and analysis of business data to support decision-making processes.

Characteristics of data warehouses

Data warehouses possess several key characteristics that distinguish them from other ty
pes of databases. These characteristics are crucial for supporting the analytical and repo
rting needs of organizations

1. Integrated Data:

Data warehouses integrate data from various sources within an organization. This integra
tion ensures that data is consistent, standardized, and can be easily compared and analy
zed.

2. Time-Variant:

Data warehouses store historical data, allowing users to analyze changes and trends over
time.

3. Non-Volatile:

Once data is loaded into a data warehouse, it is typically not updated or deleted. Instead,
changes are tracked through the addition of new records.

4. Decision Support System:

The purpose of a data warehouse It to provides decision-makers with the necessary informati
on and insights to make informed and strategic decisions.

5. Scalability:
Scalability ensures that the data warehouse can support increasing volumes of data as a
n organization grows

6. Security and Access Control:

Data warehouses implement robust security measures to protect sensitive information. A


ccess control mechanisms ensure that users have appropriate permissions to access and
manipulate the data

Comparison between a database and a data warehouse

Databases are optimized for transactional processing and the efficient management of da
y-to-day operations, meanwhile data warehouses are designed to support analytical proce
ssing and decision-making by providing a consolidated and historical view of data. They
complement each other in an organization's data architecture, serving different but com
plementary purposes.

In general, there are key differences in terms of their design, functionality, and use cases.
below are the main differences between a database and a data warehouse:

1. Purpose

Database: Primarily designed for transactional processing and the efficient management
of day-to-day operations. It supports functions like data insertion, deletion, and modifica
tion.

Data Warehouse: Designed for analytical processing and to support decision-making pro
cesses. It focuses on providing a consolidated and historical view of data for reporting an
d analysis.

2. Data Structure:

Database: Typically follows a normalized data structure to minimize redundancy and maintain data
integrity. Emphasizes transactional consistency.
Data Warehouse: Often follows a denormalized or partially denormalized data structure optimized for
analytical querying. Emphasizes query performance and ease of analysis.

3. Data Volume:

Database: Handles moderate to large volumes of transactional data generated by day-to-day


operations.
Data Warehouse: Handles large volumes of historical data from various sources, supporting complex
analytical queries.

4. Query Complexity:

Database: Often follows a normalized schema design to reduce redundancy and ensure consistency in
transactional data.
Data Warehouse: May use a star schema, snowflake schema, or other dimensional models to facilitate
efficient analytical querying.
5. Data Load and Refresh:
Database: Real-time or near-real-time data updates are common for transactional systems.
Data Warehouse: Typically involves periodic batch loading of data, with scheduled refreshes to
maintain historical records.

6. Performance Optimization:

Database: Optimized for transactional processing, with a focus on maintaining data consistency and
concurrency control.
Data Warehouse: Optimized for analytical processing, with a focus on query

Data warehouse Schema Design

The schema defines the structure of the data stored in the warehouse and plays a crucial
role in determining how easily and quickly users can query and analyze the data.

Data Warehouse may use a star schema, snowflake schema, a hybrid approach, or other dim
ensional models to facilitate efficient analytical querying.
Star Schema - A star schema is a type of data modeling technique used in data warehouse to represent
data in a structured way. In star schema, data is organized into a central fact table that is connected to
one or more dimension tables, forming a star-like structure.

A fact table: the central table in a start schema is known as the fact table and it contains quantitative
data ( also referred to as measures or metrics ) that are of interest to the user or organization.

The dimension table in a star schema contain the descriptive attributes to the quantitative data stored in
the fact table. These tables contain information that is used to filter, group and aggregate the
quantitative data in the fact table.

snowflake schema - A snowflake schema is a multi-dimensional data model that is an extension


of the star schema, where dimension tables are broken down into multiple related tables ( sub
dimensions ).
DATA MINING
PRESENTED BY:
ATANDA MARUF OLADELE
14/52HA138

Introduction
Data mining is one of the most useful techniques that help entrepreneurs, researchers, and individuals
to extract valuable information from huge sets of data. Data mining is also called Knowledge
Discovery in Database (KDD). The knowledge discovery process includes Data cleaning, Data
integration, Data selection, Data transformation, Data mining, Pattern evaluation, and Knowledge
presentation.
This note on Data mining tutorial includes all topics of Data mining such as applications, Data mining
vs Machine learning, Data mining tools, Social Media Data mining, Data mining techniques,
Clustering in data mining, Challenges in Data mining, etc.
What is Data Mining?
The process of extracting information to identify patterns, trends, and useful data that would allow the
business to take the data-driven decision from huge sets of data is called Data Mining.
In other words, we can say that Data Mining is the process of investigating hidden patterns of
information to various perspectives for categorization into useful data, which is collected and
assembled in particular areas such as data warehouses, efficient analysis, data mining algorithm,
helping decision making and other data requirement to eventually cost-cutting and generating revenue.
Data mining is the act of automatically searching for large stores of information to find trends and
patterns that go beyond simple analysis procedures. Data mining utilizes complex mathematical
algorithms for data segments and evaluates the probability of future events. Data Mining is also called
Knowledge Discovery of Data (KDD).
Data Mining is a process used by organizations to extract specific data from huge databases to solve
business problems. It primarily turns raw data into useful information.
Data Mining is similar to Data Science carried out by a person, in a specific situation, on a particular
data set, with an objective. This process includes various types of services such as text mining, web
mining, audio and video mining, pictorial data mining, and social media mining. It is done through
software that is simple or highly specific. By outsourcing data mining, all the work can be done faster
with low operation costs. Specialized firms can also use new technologies to collect data that is
impossible to locate manually. There are tonnes of information available on various platforms, but
very little knowledge is accessible. The biggest challenge is to analyze the data to extract important
information that can be used to solve a problem or for company development. There are many
powerful instruments and techniques available to mine data and find better insight from it.

Types of Data Mining


Data mining can be performed on the following types of data:

Relational Database
A relational database is a collection of multiple data sets formally organized by tables, records, and
columns from which data can be accessed in various ways without having to recognize the database
tables. Tables convey and share information, which facilitates data searchability, reporting, and
organization.

Data warehouses
A Data Warehouse is the technology that collects the data from various sources within the organization
to provide meaningful business insights. The huge amount of data comes from multiple places such as
Marketing and Finance. The extracted data is utilized for analytical purposes and helps in decision-
making for a business organization. The data warehouse is designed for the analysis of data rather than
transaction processing.

Data Repositories
The Data Repository generally refers to a destination for data storage. However, many IT professionals
utilize the term more clearly to refer to a specific kind of setup within an IT structure.
For example, a group of databases, where an organization has kept various kinds of information.

Object-Relational Database
A combination of an object-oriented database model and relational database model is called an object-
relational model. It supports Classes, Objects, Inheritance, etc.
One of the primary objectives of the Object-relational data model is to close the gap between the
Relational database and the object-oriented model practices frequently utilized in many programming
languages, for example, C++, Java, C#, and so on.
Transactional Database
A transactional database refers to a database management system (DBMS) that has the potential to
undo a database transaction if it is not performed appropriately. Even though this was a unique
capability a very long while back, today, most of the relational database systems support transactional
database activities.
Advantages of Data Mining
1. The Data Mining technique enables organizations to obtain knowledge-based data.
2. Data mining enables organizations to make lucrative modifications in operation and
production.
3. Compared with other statistical data applications, data mining is a cost-efficient.
4. Data Mining helps the decision-making process of an organization.
5. It Facilitates the automated discovery of hidden patterns as well as the prediction of
trends and behaviors.
6. It can be induced in the new system as well as the existing platforms.
7. It is a quick process that makes it easy for new users to analyze enormous amounts of
data in a short time.
Disadvantages of Data Mining
1. There is a probability that the organizations may sell useful data of customers to other
organizations for money. As per the report, American Express has sold credit card
purchases of their customers to other organizations.
2. Many data mining analytics software is difficult to operate and needs advance training
to work on.
3. Different data mining instruments operate in distinct ways due to the different
algorithms used in their design. Therefore, the selection of the right data mining tools is
a very challenging task.
4. The data mining techniques are not precise, so that it may lead to severe consequences
in certain conditions.
Applications of Data Mining
Data Mining is primarily used by organizations with intense consumer demands- Retail,
Communication, Financial, marketing company, determine price, consumer preferences, product
positioning, and impact on sales, customer satisfaction, and corporate profits. Data mining enables a
retailer to use point-of-sale records of customer purchases to develop products and promotions that
help the organization to attract the customer.

These are the following areas where data mining


is widely used:
Data Mining in Healthcare
Data mining in healthcare has excellent potential to improve the health system. It uses data and
analytics for better insights and to identify best practices that will enhance health care services and
reduce costs. Analysts use data mining approaches such as Machine learning, multi-dimensional
database, Data visualization, soft computing, and statistics. Data Mining can be used to forecast
patients in each category. The procedures ensure that the patients get intensive care at the right place
and at the right time. Data mining also enables healthcare insurers to recognize fraud and abuse.
Data Mining in Market Basket Analysis
Market basket analysis is a modeling method based on a hypothesis. If you buy a specific group of
products, then you are more likely to buy another group of products. This technique may enable the
retailer to understand the purchase behavior of a buyer. This data may assist the retailer in
understanding the requirements of the buyer and altering the store's layout accordingly. Using a
different analytical comparison of results between various stores, between customers in different
demographic groups can be done.
Data mining in Education
Education data mining is a newly emerging field, concerned with developing techniques that explore
knowledge from the data generated from educational Environments. EDM objectives are recognized as
affirming student's future learning behavior, studying the impact of educational support, and
promoting learning science. An organization can use data mining to make precise decisions and also to
predict the results of the student. With the results, the institution can concentrate on what to teach and
how to teach.
Data Mining in Manufacturing Engineering
Knowledge is the best asset possessed by a manufacturing company. Data mining tools can be
beneficial to find patterns in a complex manufacturing process. Data mining can be used in system-
level designing to obtain the relationships between product architecture, product portfolio, and data
needs of the customers. It can also be used to forecast the product development period, cost, and
expectations among the other tasks.
Data Mining in CRM (Customer Relationship Management)
Customer Relationship Management (CRM) is all about obtaining and holding Customers, also
enhancing customer loyalty and implementing customer-oriented strategies. To get a decent
relationship with the customer, a business organization needs to collect data and analyze the data. With
data mining technologies, the collected data can be used for analytics.

Data Mining in Fraud detection


Billions of dollars are lost to the action of frauds. Traditional methods of fraud detection are a little bit
time consuming and sophisticated. Data mining provides meaningful patterns and turning data into
information. An ideal fraud detection system should protect the data of all the users. Supervised
methods consist of a collection of sample records, and these records are classified as fraudulent or
non-fraudulent. A model is constructed using this data, and the technique is made to identify whether
the document is fraudulent or not.
Data Mining in Lie Detection
Apprehending a criminal is not a big deal, but bringing out the truth from him is a very challenging
task. Law enforcement may use data mining techniques to investigate offenses, monitor suspected
terrorist communications, etc. This technique includes text mining also, and it seeks meaningful
patterns in data, which is usually unstructured text. The information collected from the previous
investigations is compared, and a model for lie detection is constructed.
Data Mining Financial Banking
The Digitalization of the banking system is supposed to generate an enormous amount of data with
every new transaction. The data mining technique can help bankers by solving business-related
problems in banking and finance by identifying trends, casualties, and correlations in business
information and market costs that are not instantly evident to managers or executives because the data
volume is too large or are produced too rapidly on the screen by experts. The manager may find these
data for better targeting, acquiring, retaining, segmenting, and maintain a profitable customer.
Challenges of Implementation in Data mining
Although data mining is very powerful, it faces many challenges during its execution. Various
challenges could be related to performance, data, methods, and techniques, etc. The process of data
mining becomes effective when the challenges or problems are correctly recognized and adequately
resolved.

Incomplete and Noisy Data


The process of extracting useful data from large volumes of data is data mining. The data in the real-
world is heterogeneous, incomplete, and noisy. Data in huge quantities will usually be inaccurate or
unreliable. These problems may occur due to data measuring instrument or because of human errors.
Suppose a retail chain collects phone numbers of customers who spend more than $ 500, and the
accounting employees put the information into their system. The person may make a digit mistake
when entering the phone number, which results in incorrect data. Even some customers may not be
willing to disclose their phone numbers, which results in incomplete data. The data could get changed
due to human or system error. All these consequences (noisy and incomplete data) makes data mining
challenging.
Data Distribution
Real-worlds data is usually stored on various platforms in a distributed computing environment. It
might be in a database, individual systems, or even on the internet. Practically, It is a quite tough task
to make all the data to a centralized data repository mainly due to organizational and technical
concerns. For example, various regional offices may have their servers to store their data. It is not
feasible to store, all the data from all the offices on a central server. Therefore, data mining requires
the development of tools and algorithms that allow the mining of distributed data.
Complex Data
Real-world data is heterogeneous, and it could be multimedia data, including audio and video, images,
complex data, spatial data, time series, and so on. Managing these various types of data and extracting
useful information is a tough task. Most of the time, new technologies, new tools, and methodologies
would have to be refined to obtain specific information.
Performance
The data mining system's performance relies primarily on the efficiency of algorithms and techniques
used. If the designed algorithm and techniques are not up to the mark, then the efficiency of the data
mining process will be affected adversely.
Data Privacy and Security
Data mining usually leads to serious issues in terms of data security, governance, and privacy. For
example, if a retailer analyzes the details of the purchased items, then it reveals data about buying
habits and preferences of the customers without their permission.
Data Visualization
In data mining, data visualization is a very important process because it is the primary method that
shows the output to the user in a presentable way. The extracted data should convey the exact meaning
of what it intends to express. But many times, representing the information to the end-user in a precise
and easy way is difficult. The input data and the output information being complicated, very efficient,
and successful data visualization processes need to be implemented to make it successful.
There are many more challenges in data mining in addition to the problems above-mentioned. More
problems are disclosed as the actual data mining process begins, and the success of data mining relies
on getting rid of all these difficulties.

Data Mining Techniques


Data mining includes the utilization of refined data analysis tools to find previously unknown, valid
patterns and relationships in huge data sets. These tools can incorporate statistical models, machine
learning techniques, and mathematical algorithms, such as neural networks or decision trees. Thus,
data mining incorporates analysis and prediction.
Depending on various methods and technologies from the intersection of machine learning, database
management, and statistics, professionals in data
mining have devoted their careers to better
understanding how to process and make conclusions
from the huge amount of data, but what are the
methods they use to make it happen?
In recent data mining projects, various major data
mining techniques have been developed and
used, including association, classification, clustering,
prediction, sequential patterns, and regression.
1. Classification
This technique is used to obtain important and relevant information about data and metadata. This data
mining technique helps to classify data in different classes.
Data mining techniques can be classified by different criteria, as follows:
i. Classification of Data mining frameworks as per the type of data sources mined:
This classification is as per the type of data handled. For example, multimedia, spatial data,
text data, time-series data, World Wide Web, and so on..
ii. Classification of data mining frameworks as per the database involved:
This classification based on the data model involved. For example. Object-oriented database,
transactional database, relational database, and so on..
iii. Classification of data mining frameworks as per the kind of knowledge discovered:
This classification depends on the types of knowledge discovered or data mining
functionalities. For example, discrimination, classification, clustering, characterization, etc.
some frameworks tend to be extensive frameworks offering a few data mining functionalities
together..
iv. Classification of data mining frameworks according to data mining techniques used:
This classification is as per the data analysis approach utilized, such as neural networks,
machine learning, genetic algorithms, visualization, statistics, data warehouse-oriented or
database-oriented, etc.
The classification can also take into account, the level of user interaction involved in the data mining
procedure, such as query-driven systems, autonomous systems, or interactive exploratory systems.
2. Clustering
Clustering is a division of information into groups of connected objects. Describing the data by a few
clusters mainly loses certain confine details, but accomplishes improvement. It models data by its
clusters. Data modeling puts clustering from a historical point of view rooted in statistics,
mathematics, and numerical analysis. From a machine learning point of view, clusters relate to hidden
patterns, the search for clusters is unsupervised learning, and the subsequent framework represents a
data concept. From a practical point of view, clustering plays an extraordinary job in data mining
applications. For example, scientific data exploration, text mining, information retrieval, spatial
database applications, CRM, Web analysis, computational biology, medical diagnostics, and much
more.
In other words, we can say that Clustering analysis is a data mining technique to identify similar data.
This technique helps to recognize the differences and similarities between the data. Clustering is very
similar to the classification, but it involves grouping chunks of data together based on their
similarities.
3. Regression
Regression analysis is the data mining process is used to identify and analyze the relationship between
variables because of the presence of the other factor. It is used to define the probability of the specific
variable. Regression, primarily a form of planning and modeling. For example, we might use it to
project certain costs, depending on other factors such as availability, consumer demand, and
competition. Primarily it gives the exact relationship between two or more variables in the given data
set.
4. Association Rules
This data mining technique helps to discover a link between two or more items. It finds a hidden
pattern in the data set.
Association rules are if-then statements that support to show the probability of interactions between
data items within large data sets in different types of databases. Association rule mining has several
applications and is commonly used to help sales correlations in data or medical data sets.
The way the algorithm works is that you have various data, For example, a list of grocery items that
you have been buying for the last six months. It calculates a percentage of items being purchased
together.
These are three major measurements technique:
o Lift:
This measurement technique measures the accuracy of the confidence over how often item B is
purchased.
(Confidence) / (item B)/ (Entire dataset)
o Support:
This measurement technique measures how often multiple items are purchased and compared it
to the overall dataset.
(Item A + Item B) / (Entire dataset)
o Confidence:
This measurement technique measures how often item B is purchased when item A is
purchased as well.
(Item A + Item B)/ (Item A)
5. Outer detection
This type of data mining technique relates to the observation of data items in the data set, which do not
match an expected pattern or expected behavior. This technique may be used in various domains like
intrusion, detection, fraud detection, etc. It is also known as Outlier Analysis or Outilier mining. The
outlier is a data point that diverges too much from the rest of the dataset. The majority of the real-
world datasets have an outlier. Outlier detection plays a significant role in the data mining field.
Outlier detection is valuable in numerous fields like network interruption identification, credit or debit
card fraud detection, detecting outlying in wireless sensor network data, etc.
6. Sequential Patterns
The sequential pattern is a data mining technique specialized for evaluating sequential data to
discover sequential patterns. It comprises of finding interesting subsequences in a set of sequences,
where the stake of a sequence can be measured in terms of different criteria like length, occurrence
frequency, etc.
In other words, this technique of data mining helps to discover or recognize similar patterns in
transaction data over some time.
7. Prediction
Prediction used a combination of other data mining techniques such as trends, clustering,
classification, etc. It analyzes past events or instances in the right sequence to predict a future event.

Data Mining Implementation Process


Many different sectors are taking advantage of data mining to boost their business efficiency, including
manufacturing, chemical, marketing, aerospace, etc. Therefore, the need for a conventional data
mining process improved effectively. Data mining techniques must be reliable, repeatable by company
individuals with little or no knowledge of the data mining context. As a result, a cross-industry
standard process for data mining (CRISP-DM) was first introduced in 1990, after going through many
workshops, and contribution for more than 300 organizations.
Data mining is described as a process of finding hidden precious data by evaluating the huge quantity
of information stored in data warehouses, using multiple data mining techniques such as Artificial
Intelligence (AI), Machine learning and statistics.
Let's examine the implementation process for data mining in details:
The Cross-Industry Standard Process for Data Mining (CRISP-DM)
Cross-industry Standard Process of Data Mining (CRISP-DM) comprises of six phases designed as a
cyclical method as the given figure:

1. Business understanding
It focuses on understanding the project goals and requirements form a business point of view, then
converting this information into a data mining problem afterward a preliminary plan designed to
accomplish the target.
Tasks:
o Determine business objectives

o Access situation
o Determine data mining goals

o Produce a project plan

Determine Business Objectives


o It Understands the project targets and prerequisites from a business point of view.

o Thoroughly understand what the customer wants to achieve.

o Reveal significant factors, at the starting, it can impact the result of the project.

Access situation
o It requires a more detailed analysis of facts about all the resources, constraints, assumptions,
and others that ought to be considered.

Determine data mining goals


o A business goal states the target of the business terminology. For example, increase catalog
sales to the existing customer.
o A data mining goal describes the project objectives. For example, It assumes how many objects
a customer will buy, given their demographics details (Age, Salary, and City) and the price of
the item over the past three years.
Produce a project plan:
o It states the targeted plan to accomplish the business and data mining plan.

o The project plan should define the expected set of steps to be performed during the rest of the
project, including the latest technique and better selection of tools.
2. Data Understanding
Data understanding starts with an original data collection and proceeds with operations to get familiar
with the data, to data quality issues, to find better insight in data, or to detect interesting subsets for
concealed information hypothesis.
Tasks:
o Collects initial data

o Describe data

o Explore data

o Verify data quality

Collect initial data


o It acquires the information mentioned in the project resources.

o It includes data loading if needed for data understanding.

o It may lead to original data preparation steps.

o If various information sources are acquired then integration is an extra issue, either here or at
the subsequent stage of data preparation.
Describe data
o It examines the "gross" or "surface" characteristics of the information obtained.

o It reports on the outcomes.

Explore data
o Addressing data mining issues that can be resolved by querying,
visualizing, and reporting, including:
o Distribution of important characteristics, results of simple aggregation.

o Establish the relationship between the small number of attributes.

o Characteristics of important sub-populations, simple statical analysis.

o It may refine the data mining objectives.

o It may contribute or refine the information description, and quality reports.

o It may feed into the transformation and other necessary information preparation.

Verify data quality


o It examines the data quality and addressing questions.

3. Data Preparation
o It usually takes more than 90 percent of the time.

o It covers all operations to build the final data set from the original raw information.

o Data preparation is probable to be done several times and not in any prescribed order.

Tasks
o Select data

o Clean data

o Construct data
o Integrate data

o Format data

Select data
o It decides which information to be used for evaluation.

o In the data selection criteria include significance to data mining objectives, quality and
technical limitations such as data volume boundaries or data types.
o It covers the selection of characteristics and the choice of the document in the table.

Clean data
o It may involve the selection of clean subsets of data, inserting appropriate defaults or more
ambitious methods, such as estimating missing information by modeling.

Construct data
o It comprises of Constructive information preparation, such as generating derived
characteristics,
o complete new documents, or transformed values of current characteristics.

Integrate data
o Integrate data refers to the methods whereby data is combined from various tables, or
documents to create new documents or values.
Format data
o Formatting data refer mainly to linguistic changes produced to information that does not alter
their significance but may require a modeling tool.

4. Modeling
In modeling, various modeling methods are selected and applied, and their parameters are measured to
optimum values. Some methods gave particular requirements on the form of data. Therefore, stepping
back to the data preparation phase is necessary.
Tasks
o Select modeling technique

o Generate test design


o Build model

o Access model

Select modeling technique


o It selects the real modeling method that is to be used. For example, decision tree, neural
network.
o If various methods are applied,then it performs this task individually for each method.

Generate test Design


o Generate a procedure or mechanism for testing the validity and quality of the model before
constructing a model. For example, in classification, error rates are commonly used as quality
measures for data mining models. Therefore, typically separate the data set into train and test
o set, build the model on the train set and assess its quality on the separate test set.

Build model
o To create one or more models, we need to run the modeling tool on the prepared data set.

Assess model
o It interprets the models according to its domain expertise, the data mining success criteria, and
the required design.
o It assesses the success of the application of modeling and discovers methods more technically.

o It Contacts business analytics and domain specialists later to discuss the outcomes of data
mining in the business context.
5. Evaluation
o At the last of this phase, a decision on the use of the data mining results should be reached.

o It evaluates the model efficiently, and review the steps executed to build the model and to
ensure that the business objectives are properly achieved.
o The main objective of the evaluation is to determine some significant business issue that has
not been regarded adequately.
o At the last of this phase, a decision on the use of the data mining outcomes should be reached.

Tasks
o Evaluate results

o Review process

o Determine next steps


Evaluate results
o It assesses the degree to which the model meets the organization's business objectives.

o It tests the model on test apps in the actual implementation when time and budget limitations
permit and also assesses other data mining results produced.
o It unveils additional difficulties, suggestions, or information for future instructions.

Review process
o The review process does a more detailed evaluation of the data mining engagement to
determine when there is a significant factor or task that has been somehow ignored.
o It reviews quality assurance problems.

Determine next steps


o It decides how to proceed at this stage.

o It decides whether to complete the project and move on to deployment when necessary or
whether to initiate further iterations or set up new data-mining initiatives.it includes resources
analysis and budget that influence the decisions.
6. Deployment
Determine:
o Deployment refers to how the outcomes need to be utilized.

Deploy data mining results by:


o It includes scoring a database, utilizing results as company guidelines, interactive internet
scoring.
o The information acquired will need to be organized and presented in a way that can be used by
the client. However, the deployment phase can be as easy as producing. However, depending
on the demands, the deployment phase may be as simple as generating a report or as
complicated as applying a repeatable data mining method across the organizations.
Tasks
o Plan deployment

o Plan monitoring and maintenance

o Produce final report

o Review project

Plan deployment:
o To deploy the data mining outcomes into the business, takes the assessment results and
concludes a strategy for deployment.
o It refers to documentation of the process for later deployment.

Plan monitoring and maintenance


o It is important when the data mining results become part of the day-to-day business and its
environment.
o It helps to avoid unnecessarily long periods of misuse of data mining results.

o It needs a detailed analysis of the monitoring process.

Produce final report


o A final report can be drawn up by the project leader and his team.

o It may only be a summary of the project and its experience.

o It may be a final and comprehensive presentation of data mining.

Review project
o Review projects evaluate what went right and what went wrong, what was done wrong, and
what needs to be improved.

Data Mining Architecture


Introduction
Data mining is a significant method where previously unknown and potentially useful information is
extracted from the vast amount of data. The data mining process involves several components, and
these components constitute a data mining system architecture.
The significant components of data mining systems are a data source, data mining engine, data
warehouse server, the pattern evaluation module, graphical user interface, and knowledge base.
Data Source
The actual source of data is the Database, data warehouse, World Wide Web (WWW), text files, and
other documents. You need a huge amount of historical data for data mining to be successful.
Organizations typically store data in databases or data warehouses. Data warehouses may comprise
one or more databases, text files spreadsheets, or other repositories of data. Sometimes, even plain text
files or spreadsheets may contain information. Another primary source of data is the World Wide Web
or the internet.
Different Processes
Before passing the data to the database or data warehouse server, the data must be cleaned, integrated,
and selected. As the information comes from various sources and in different formats, it can't be used
directly for the data mining procedure because the data may not be complete and accurate. So, the first
data requires to be cleaned and unified. More information than needed will be collected from various
data sources, and only the data of interest will have to be selected and passed to the server. These
procedures are not as easy as we think. Several methods may be performed on the data as part of
selection, integration, and cleaning.
Database or Data Warehouse Server
The database or data warehouse server consists of the original data that is ready to be processed.
Hence, the server is cause for retrieving the relevant data that is based on data mining as per user
request.
Data Mining Engine
The data mining engine is a major component of any data mining system. It contains several modules
for operating data mining tasks, including association, characterization, classification, clustering,
prediction, time-series analysis, etc.
In other words, we can say data mining is the root of our data mining architecture. It comprises
instruments and software used to obtain insights and knowledge from data collected from various data
sources and stored within the data warehouse.
Pattern Evaluation Module
The Pattern evaluation module is primarily responsible for the measure of investigation of the pattern
by using a threshold value. It collaborates with the data mining engine to focus the search on exciting
patterns.
This segment commonly employs stake measures that cooperate with the data mining modules to
focus the search towards fascinating patterns. It might utilize a stake threshold to filter out discovered
patterns. On the other hand, the pattern evaluation module might be coordinated with the mining
module, depending on the implementation of the data mining techniques used. For efficient data
mining, it is abnormally suggested to push the evaluation of pattern stake as much as possible into the
mining procedure to confine the search to only fascinating patterns.
Graphical User Interface
The graphical user interface (GUI) module communicates between the data mining system and the
user. This module helps the user to easily and efficiently use the system without knowing the
complexity of the process. This module cooperates with the data mining system when the user
specifies a query or a task and displays the results.
Knowledge Base
The knowledge base is helpful in the entire process of data mining. It might be helpful to guide the
search or evaluate the stake of the result patterns. The knowledge base may even contain user views
and data from user experiences that might be helpful in the data mining process. The data mining
engine may receive inputs from the knowledge base to make the result more accurate and reliable. The
pattern assessment module regularly interacts with the knowledge base to get inputs, and also update
it.
KDD- Knowledge Discovery in Databases
The term KDD stands for Knowledge Discovery in Databases. It refers to the broad procedure of
discovering knowledge in data and emphasizes the high-level applications of specific Data Mining
techniques. It is a field of interest to researchers in various fields, including artificial intelligence,
machine learning, pattern recognition, databases, statistics, knowledge acquisition for expert systems,
and data visualization.
The main objective of the KDD process is to extract information from data in the context of large
databases. It does this by using Data Mining algorithms to identify what is deemed knowledge.
The Knowledge Discovery in Databases is considered as a programmed, exploratory analysis and
modeling of vast data repositories. KDD is the organized procedure of recognizing valid, useful, and
understandable patterns from huge and complex data sets. Data Mining is the root of the KDD
procedure, including the inferring of algorithms that investigate the data, develop the model, and find
previously unknown patterns. The model is used for extracting the knowledge from the data, analyze
the data, and predict the data.
The availability and abundance of data today make knowledge discovery and Data Mining a matter of
impressive significance and need. In the recent development of the field, it isn't surprising that a wide
variety of techniques is presently accessible to specialists and experts.
The KDD Process
The knowledge discovery process (illustrates in the given figure) is iterative and interactive, comprises
of nine steps. The process is iterative at each stage, implying that moving back to the previous actions
might be required. The process has many imaginative aspects in the sense that one can’t presents one
formula or make a complete scientific categorization for the correct decisions for each step and
application type. Thus, it is needed to understand the process and the different requirements and
possibilities in each stage.
The process begins with determining the KDD objectives and ends with the implementation of the
discovered knowledge. At that point, the loop is closed, and the Active Data Mining starts.
Subsequently, changes would need to be made in the application domain. For example, offering
various features to cell phone users in order to reduce churn. This closes the loop, and the impacts are
then measured on the new data repositories, and the KDD process again. Following is a concise
description of the nine-step KDD process, Beginning with a managerial step:

1. Building up an
understanding of the
application domain
This is the initial
preliminary step. It develops the scene for understanding what should be done with the various
decisions like transformation, algorithms, representation, etc. The individuals who are in charge of a
KDD venture need to understand and characterize the objectives of the end-user and the environment
in which the knowledge discovery process will occur (involves relevant prior knowledge).
2. Choosing and creating a data set on which discovery will be performed
Once defined the objectives, the data that will be utilized for the knowledge discovery process should
be determined. This incorporates discovering what data is accessible, obtaining important data, and
afterward integrating all the data for knowledge discovery onto one set involves the qualities that will
be considered for the process. This process is important because of Data Mining learns and discovers
from the accessible data. This is the evidence base for building the models. If some significant
attributes are missing, at that point, then the entire study may be unsuccessful from this respect, the
more attributes are considered. On the other hand, to organize, collect, and operate advanced data
repositories is expensive, and there is an arrangement with the opportunity for best understanding the
phenomena. This arrangement refers to an aspect where the interactive and iterative aspect of the KDD
is taking place. This begins with the best available data sets and later expands and observes the impact
in terms of knowledge discovery and modeling.
3. Preprocessing and cleansing
In this step, data reliability is improved. It incorporates data clearing, for example, Handling the
missing quantities and removal of noise or outliers. It might include complex statistical techniques or
use a Data Mining algorithm in this context. For example, when one suspects that a specific attribute
of lacking reliability or has many missing data, at this point, this attribute could turn into the objective
of the Data Mining supervised algorithm. A prediction model for these attributes will be created, and
after that, missing data can be predicted. The expansion to which one pays attention to this level relies
upon numerous factors. Regardless, studying the aspects is significant and regularly revealing by
itself, to enterprise data frameworks.
4. Data Transformation
In this stage, the creation of appropriate data for Data Mining is prepared and developed. Techniques
here incorporate dimension reduction (for example, feature selection and extraction and record
sampling), also attribute transformation (for example, discretization of numerical attributes and
functional transformation). This step can be essential for the success of the entire KDD project, and it
is typically very project-specific. For example, in medical assessments, the quotient of attributes may
often be the most significant factor and not each one by itself. In business, we may need to think about
impacts beyond our control as well as efforts and transient issues. For example, studying the impact of
advertising accumulation. However, if we do not utilize the right transformation at the starting, then
we may acquire an amazing effect that insights to us about the transformation required in the next
iteration. Thus, the KDD process follows upon itself and prompts an understanding of the
transformation required.
5. Prediction and description
We are now prepared to decide on which kind of Data Mining to use, for example, classification,
regression, clustering, etc. This mainly relies on the KDD objectives, and also on the previous steps.
There are two significant objectives in Data Mining, the first one is a prediction, and the second one is
the description. Prediction is usually referred to as supervised Data Mining, while descriptive Data
Mining incorporates the unsupervised and visualization aspects of Data Mining. Most Data Mining
techniques depend on inductive learning, where a model is built explicitly or implicitly by generalizing
from an adequate number of preparing models. The fundamental assumption of the inductive approach
is that the prepared model applies to future cases. The technique also takes into account the level of
meta-learning for the specific set of accessible data.
6. Selecting the Data Mining algorithm
Having the technique, we now decide on the strategies. This stage incorporates choosing a particular
technique to be used for searching patterns that include multiple inducers. For example, considering
precision versus understandability, the previous is better with neural networks, while the latter is better
with decision trees. For each system of meta-learning, there are several possibilities of how it can be
succeeded. Meta-learning focuses on clarifying what causes a Data Mining algorithm to be fruitful or
not in a specific issue. Thus, this methodology attempts to understand the situation under which a Data
Mining algorithm is most suitable. Each algorithm has parameters and strategies of leaning, such as
ten folds cross-validation or another division for training and testing.
7. Utilizing the Data Mining Algorithm
At last, the implementation of the Data Mining algorithm is reached. In this stage, we may need to
utilize the algorithm several times until a satisfying outcome is obtained. For example, by turning the
algorithms control parameters, such as the minimum number of instances in a single leaf of a decision
tree.
8. Evaluation
In this step, we assess and interpret the mined patterns, rules, and reliability to the objective
characterized in the first step. Here we consider the preprocessing steps as for their impact on the Data
Mining algorithm results. For example, including a feature in step 4, and repeat from there. This step
focuses on the comprehensibility and utility of the induced model. In this step, the identified
knowledge is also recorded for further use. The last step is the use, and overall feedback and discovery
results acquire by Data Mining.
9. Using the discovered knowledge
Now, we are prepared to include the knowledge into another system for further activity. The
knowledge becomes effective in the sense that we may make changes to the system and measure the
impacts. The accomplishment of this step decides the effectiveness of the whole KDD process. There
are numerous challenges in this step, such as losing the "laboratory conditions" under which we have
worked. For example, the knowledge was discovered from a certain static depiction, it is usually a set
of data, but now the data becomes dynamic. Data structures may change certain quantities that become
unavailable, and the data domain might be modified, such as an attribute that may have a value that
was not expected previously.
Data Mining vs Machine Learning
Data Mining relates to extracting information from a large quantity of data. Data mining is a technique
of discovering different kinds of patterns that are inherited in the data set and which are precise, new,
and useful data. Data Mining is working as a subset of business analytics and similar to experimental
studies. Data Mining's origins are databases, statistics.
Machine learning includes an algorithm that automatically improves through data-based experience.
Machine learning is a way to find a new algorithm from experience. Machine learning includes the
study of an algorithm that can automatically extract the data. Machine learning utilizes data mining
techniques and another learning algorithm to construct models of what is happening behind certain
information so that it can predict future results.
Data Mining and Machine learning are areas that have been influenced by each other, although they
have many common things, yet they have different ends.
Data Mining is performed on certain data sets by humans to find interesting patterns between the items
in the data set. Data Mining uses techniques created by machine learning for predicting the results
while machine learning is the capability of the computer to learn from a minded data set.
Machine learning algorithms take the information that represents the relationship between items in
data sets and creates models in order to predict future results. These models are nothing more than
actions that will be taken by the machine to achieve a result.
What is Data Mining?
Data Mining is the method of extraction of data or previously unknown data patterns from huge sets of
data. Hence as the word suggests, we 'Mine for specific data' from the large data set. Data mining is
also called Knowledge Discovery Process, is a field of science that is used to determine the properties
of the datasets. Gregory Piatetsky-Shapiro founded the term "Knowledge Discovery in
Databases" (KDD) in 1989. The term "data mining" came in the database community in 1990. Huge
sets of data collected from data warehouses or complex datasets such as time series, spatial, etc. are
extracted in order to extract interesting correlations and patterns between the data items. For Machine
Learning algorithms, the output of the data mining algorithm is often used as input.
What is Machine learning?
Machine learning is related to the development and designing of a machine that can learn itself from a
specified set of data to obtain a desirable result without it being explicitly coded. Hence Machine
learning implies 'a machine which learns on its own. Arthur Samuel invented the term Machine
learning an American pioneer in the area of computer gaming and artificial intelligence in 1959. He
said that "it gives computers the ability to learn without being explicitly programmed."
Machine learning is a technique that creates complex algorithms for large data processing and provides
outcomes to its users. It utilizes complex programs that can learn through experience and make
predictions.
The algorithms are enhanced by themselves by frequent input of training data. The aim of machine
learning is to understand information and build models from data that can be understood and used by
humans.
Machine learning algorithms are divided into two types:
1. Unsupervised Learning
2. Supervised Learning
1. Unsupervised Machine Learning:
Unsupervised learning does not depend on trained data sets to predict the results, but it utilizes direct
techniques such as clustering and association in order to predict the results. Trained data sets are
defined as the input for which the output is known.
2. Supervised Machine Learning:
As the name implies, supervised learning refers to the presence of a supervisor as a teacher.
Supervised learning is a learning process in which we teach or train the machine using data which is
well leveled implies that some data is already marked with the correct responses. After that, the
machine is provided with the new sets of data so that the supervised learning algorithm analyzes the
training data and gives an accurate result from labeled data.
Major Difference between Data mining and Machine learning
1. Two-component is used to introduce data mining techniques first one is the database, and the second
one is machine learning. The database provides data management techniques, while machine learning
provides methods for data analysis. But to introduce machine learning methods, it used algorithms.
2. Data Mining utilizes more data to obtain helpful information, and that specific data will help to
predict some future results. For example, In a marketing company that utilizes last year's data to
predict the sale, but machine learning does not depend much on data. It uses algorithms. Many
transportation companies such as OLA, UBER machine learning techniques to calculate ETA
(Estimated Time of Arrival) for rides is based on this technique.
3. Data mining is not capable of self-learning. It follows the guidelines that are predefined. It will
provide the answer to a specific problem, but machine learning algorithms are self-defined and can
alter their rules according to the situation, and find out the solution for a specific problem and resolves
it in its way.
4. The main and most important difference between data mining and machine learning is that without
the involvement of humans, data mining can't work, but in the case of machine learning human effort
only involves at the time when the algorithm is defined after that it will conclude everything on its
own. Once it implemented, we can use it forever, but this is not possible in the case of data mining.
5. As machine learning is an automated process, the result produces by machine learning will be more
precise as compared to data mining.
6. Data mining utilizes the database, data warehouse server, data mining engine, and pattern
assessment techniques to obtain useful information, whereas machine learning utilizes neural
networks, predictive models, and automated algorithms to make the decisions.
Data Mining Vs Machine Learning

Factors Data Mining Machine Learning

Origin Traditional databases with unstructured It has an existing algorithm and data.
data.

Meaning Extracting information from a huge Introduce new Information from data
amount of data. as well as previous experience.

History In 1930, it was known as knowledge The first program, i.e., Samuel's
discovery in databases(KDD). checker playing program, was
established in 1950.

Responsibility Data Mining is used to obtain the rules Machine learning teaches the
from the existing data. computer, how to learn and
comprehend the rules.

Abstraction Data mining abstract from the data Machine learning reads machine.
warehouse.

Applications In compare to machine learning, data It needs a large amount of data to


mining can produce outcomes on the obtain accurate results. It has various
lesser volume of data. It is also used in applications, used in web search,
cluster analysis. spam filter, credit scoring, computer
design, etc.

Nature It involves human interference more It is automated, once designed and


towards the manual. implemented, there is no need for
human effort.

Techniques Data mining is more of research using It is a self-learned and train system to
involve a technique like a machine learning. do the task precisely.
Scope Applied in the limited fields. It can be used in a vast area.
DATABASE IN E-COMMERCE
PRESENTED BY; OLAYIWOLA FOLORUNSO OLAYIWOLA UIL/PG2022/0349

1.0 INTRODUCTION
As technology continues to evolve, it made it clear that e-commerce continues to evolve and help
businesses as well. Before the internet and technology were invented, customers physically entered
physical shops to purchase goods and request services. Today, customers can purchase products online
in their own homes and even obtain overseas products. In 2020, there’s been estimated to be 12-24
million e-commerce globally. E-Commerce (Electronic Commerce) itself is a service that is used to
carry out a transaction, in other words,E-commerce refer to a container where a website provides space
to carry out various forms of online transactions in the form of trading activities and making
purchases; on the one hand, it also utilizes internet facilities as an important role in following up on the
process. The development of technology is increasingly sticking a high impact on the systematic
performance of e-commerce to provide opportunities for the formation of good communication and
interaction of one person or with others around the world to offer an efficient impact on various
objects. In E-commerce, of course, it is impossible if apart from the use of databases, where the
database has a role as a container to store all forms of data related to transactions carried out on the
website in e-commerce.Processing various database forms results in accelerating the acquisition of
information that can improve service to customers. Because the workings of a database are to do a
similar record consisting of many interconnected fields, it deals with e-commerce that performs many
types of activities, such as sales, purchasing planning, exchange of businesspeople from one
businessperson to another, and the internal processes that companies use to support the running of the
business. Therefore, using databases in e-commerce can make it easier to access and store all forms of
data information contained in e-commerce. In addition, this is also to provide a fortress for all forms of
operational activities so that parties from e-commerce can act swiftly and appropriately in dealing with
problems. The database uses tables, rows, and columns, similar to spreadsheets, to organize and
retrieve data.

2.0 ROLES OF DATABASE IN E - COMMMERCE


Use of databases in e-commerce has a major influence on the sustainability of the methodology, which
can be described as follows:
a. Maintain file structure changes in databases in E-commerce.
This is done to inhibit efforts when other people access the data in the sense of not making changes.
b. Reduce data redundancy in E-commerce.
The role of the database is also to minimize the occurrence of duplicate data or repeated storage of
data from a system to impact some of the same data in different locations. For example, suppose a
buyer wants to change the type of item before making a transaction. In that case, it is necessary to use
the database to read the record so that the previous transaction does not experience duplicate buyer
transaction data.
c. Maintaining customer Data security in E-Commerce
One of the very important uses of databases is to maintain the security of data forms and all kinds of
information attached to buyer data, sellers, operations, and so on so that the data is not easily accessed
intensively by random people.
d. Make it easy to access the Data.In E-commerce, databases provide convenience for the user to
access data in the system because the database has recorded the stored data neatly and well.
e. Transactional information : One of the most important jobs that databases do is to track and
manage transaction information. Databases can keep your inventory up to date after each transaction,
such as a product in or out of stock, billing, shipping statuses, purchase order, and many more.
TYPES OF DATABASES
1. Relational database like MySQL, PostgreSQL, MariaDB, Microsoft SQL, Amazon RDS, and Azure
SQL Database.
2. Non-relational database like MongoDB, Apache Cassandra, Amazon DynamoDB, Azure Cosmos
DB, and Couchbase
3.0 FEATURES OF GOOD E-COMMERCE DATABASE DESIGN
Simple, functional database structure: The database table structure is simple but covers all the
required functionality without compromising the user experience.
High performance: Database queries execute quickly to facilitate live customer interactions and
support a frictionless shopping experience. Therefore, the selected database should have good indexing
and performance optimization options.
High availability and scalability: A good database design is highly available with automatic
snapshots and enables automatic scaling to support future platform growth as well as sudden traffic
spikes.
Based on these characteristics, a good e-commerce database design involves three key parts:
Database scope/ Coverage: The scope refers to the planned functionality of the database. The
underlying table structure of the database, its relationships, and indexes all depend on the functionality
of the e-commerce platform.
Database type: The type can vary from a relational database to a NoSQL database or a hybrid
approach depending on the requirements and the underlying data structure.
Database infrastructure: Your database can be either unmanaged or managed.
3.1.1 Some common types of data that databases store and track for e-commerce include:
Product information
Customer information
Transactional information
The database also uses some terms that we may often hear, such as :
Table
Column
Row
Redundancy
Primary Key
Foreign Key
Compound Key
Index
Referential Integrity
Transactions that occur in E-Commerce itself consist of Business to Business (B2B), Business to
Customer (B2C), to Customer to Customer (C2C).
The core function of database design in e-commerce and its application:

1. User Management

This structure explains that the user table contains all user details and user payment to store user
payment details. This structure provides more granular data control.
2. Product Management
This structure explains that the other two separate tables are the discount table, product inventory, and
product category, which are connected to them through database relationships; this approach provides
the greatest level of flexibility to the database system applied.
3. Shopping Process

The reason why many e-commerce uses databases is that database provide many benefits. The few
benefits of using a database in e-commerce include the following:
Avoid Human Error
Analyze data in various
More efficient in carrying out inventory
Help users have easier access to the required
It can help businesses protect data while only authorized users can access it.
It can help businesses restore and backup data that has been damaged or contains an error
Help pinpoint potential customers and business
Help businesses evolve and adapt to the marketing environment.

DATABASE SCHEMA DESIGN AND NORMALIZATION


Normalization is a crucial aspect of database schema design that focuses on organizing the schema to
reduce duplicity and anomalies. The normalization process consists of several stages, called normal
forms (1NF, 2NF, 3NF, etc.). Here are some essential normalization principles to follow while
designing an e-commerce database:
1NF (First Normal Form): Ensure that each column in the table is atomic, meaning it stores a single
value, and does not contain any repeating groups or multiple values. This level of normalization
simplifies data retrieval and update processes.
2NF (Second Normal Form): Make sure that each non-key column is fully functionally dependent on
the primary key; in other words, the entire primary key is necessary to determine the value of any non-
key column. This stage of normalization helps avoid partial dependency and redundancy.
3NF (Third Normal Form): Eliminate any transitive dependency by ensuring that every non-key
column is directly dependent on the primary key. This protects data integrity by preventing undesirable
update anomalies.
BCNF (Boyce-Codd Normal Form): While not always necessary for e-commerce databases, BCNF
is a more stringent level of normalization that requires every determinant in the table to be a candidate
key. It helps to eliminate additional redundancy and maintain consistency.

DATABASE SECURITY & BACKUP STRATEGIES


Ensuring the security and integrity of an e-commerce database is of paramount importance. Since e-
commerce databases store sensitive customer information such as shipping addresses, contact
information, and payment details, implementing strong security and backup measures is crucial.
Data Security Best Practices
Encryption: Use encryption for both data at rest and data in transit. Encrypting data at rest protects
stored data from unauthorized access while encrypting data in transit safeguards it from interception
during communication. Use strong encryption algorithms like AES and regularly update your
encryption keys.
Access Control: Implement well-defined access control policies, restricting users and applications to
the minimum necessary permissions for their roles. In addition, use authentication mechanisms like
multi-factor authentication to strengthen security.
Network Security: Secure the network environment by employing firewalls, intrusion detection
systems, and anti-malware software. Regularly monitor and audit network traffic for suspicious
activity.
Vulnerability Management: Regularly scan your system for vulnerabilities, update your software,
and apply patches as needed. Be proactive in finding and addressing security weaknesses.
Logging and Monitoring: Implement detailed logging and monitoring procedures to track user
activity, database changes, and potential security threats. Regularly analyze logs for anomalies and
respond to potential breaches accordingly.
Thanks.
Web Database Development and Database Administration.
PRESENTED BY; ADIMULA OLUWAMAYOWA ANNE 21/68HC003…,UIL/PG2022/0476

Database administration is the function of managing and maintaining database management


systems (DBMS) software. Mainstream DBMS software such as Oracle, IBM Db2 and Microsoft SQL
Server need ongoing management. As such, corporations that use DBMS software often hire
specialized information technology personnel called database administrators or DBAs.

Responsibilities
 Installation, configuration and upgrading of Database server software and related products.
 Evaluate Database features and Database related products.
 Establish and maintain sound backup and recovery policies and procedures.
 Take care of the Database design and implementation.
 Implement and maintain database security (create and maintain users and roles, assign
privileges).
 Database tuning and performance monitoring.
 Application tuning and performance monitoring.
 Setup and maintain documentation and standards.
 Plan growth and changes (capacity planning).
 Work as part of a team and provide 24/7 support when required.
 Do general technical troubleshooting and give cons.
 Database recovery

Types
There are three types of DBAs:

1. Systems DBAs (also referred to as physical DBAs, operations DBAs or production Support
DBAs): focus on the physical aspects of database administration such as DBMS installation,
configuration, patching, upgrades, backups, restores, refreshes, performance optimization,
maintenance and disaster recovery.
2. Development DBAs: focus on the logical and development aspects of database administration
such as data model design and maintenance, DDL (data definition language) generation, SQL
writing and tuning, coding stored procedures, collaborating with developers to help choose the
most appropriate DBMS feature/functionality and other pre-production activities.
3. Application DBAs: usually found in organizations that have purchased 3rd party application
software such as ERP (enterprise resource planning) and CRM (customer relationship
management) systems. Examples of such application software includes Oracle Applications,
Siebel and PeopleSoft (both now part of Oracle Corp.) and SAP. Application DBAs straddle
the fence between the DBMS and the application software and are responsible for ensuring that
the application is fully optimized for the database and vice versa. They usually manage all the
application components that interact with the database and carry out activities such as
application installation and patching, application upgrades,database cloning, building and
running data cleanup routines, data load process management, etc.
Database plays a critical role in web app development. It is one of the most important aspects of building an
application. It is necessary that you have a piece of good knowledge of databases before using them in your
application. Database design plays a key role in the operation of your website and provides you with
information regarding transactions, data integrity, and security issues. In this article, you will learn the role of
databases in web application development. You will also learn about the most popular web app databases and
how to connect databases to the web applications.

n the early days of computing, databases were synonymous with files on disk. The term is still
commonly used this way for example when people refer to their hard drive as their "main database".

Data is the foundation of a web application. It is used to store user information, session data, and other
application data. The database is the central repository for all of this data. Web applications use a
variety of databases to store data such as flat files, relational databases, object-relational databases, and
NoSQL databases. Each type of database has its own advantages and disadvantages when it comes to
storing and retrieving data.

A database is a collection of data and information that is stored in an organized manner for easy
retrieval. The primary purpose of a database is to store, retrieve, and update information. A database
can be used to store data related to any aspect of business operations.

Databases can be very large, containing millions of records, or very small, containing just a few
records or even a single record. They may be stored on hard disks or other media, or they may exist
only in memory. In the early days of computing, databases were stored on tape drives or punch cards.
Today they're stored on hard drives, flash memory cards, and other media.

Databases are designed to ensure that the data they contain is organized and easily retrievable. A
database management system (DBMS) is the software used to create and maintain a database.

Role of Database in Web Application


Web application development agency, developers, and designers use databases to store and organize
the data that their applications need. The role of databases in web application development has
increased over time. As a result, a number of developers create applications that use databases. You
can't fully understand web application development without understanding the role of databases. A
database is nothing but an organized collection of data that helps us, whether creating or modifying
any program. Some examples of this kind of organization are the bookshelf, the NAS storage, and
even databases on your desktop computers!

The role of databases in a web application is very important. The web application interacts with the
database to store data and retrieve data from it. The database is used to store all the information that
the user needs to store. For example, if you are developing a shopping cart website then it will contain
product details, customer details, order details, etc. In this case, you need to store this information in a
database so that we can use them later on.
Why Do Web App Developers Need a Database?
The first thing one should know when it comes to databases is the need. There are huge numbers of
businesses out there, whose revenue depends on the success and future of their database. You see, a
database is extremely important for online companies and businesses as well. These days databases are
used for various purposes like managing financial records, setting up customer profiles, keeping
inventory and ordering information, etc. But what does all this mean?

Most modern web applications are based on a database. The database stores information about the
users, products, orders, and more. A database is an important component of any web application
because it provides a central location for storing user information and business logic. In addition to
this, it allows you to store complex data structures with minimal effort.

Databases are used by businesses to collect and store customer information, financial records, and
inventory data. They're also used in research projects to store information about experiments or tests.
For example, if you were conducting a survey on the habits of people who eat cereal for breakfast, you
might use a database to keep track of your results.

Databases are also used by government agencies to store public records like birth certificates and
marriage licenses. Databases are also used by medical researchers who need to record the medical
history of patients in order to determine how effective certain treatments may be for different diseases
or conditions.

Web Application Databases Offer Benefits

Web applications are becoming more and more popular because they allow users to access information
from different devices at the same time. A web application database offers benefits such as:

Security

A web application database provides security features such as encryption and password protection. If a
user’s password becomes lost or compromised, it will not be possible for someone else to access the
information stored in the database.

Accessibility

Users can access their data from any internet-enabled device, which includes smartphones and tablets
as well as laptops and desktops. This means that users do not have to worry about losing their valuable
data because it is stored on another device.

Reliability and scalability

Web applications are usually accessed by many users simultaneously, unlike traditional desktop
applications that are accessed by one person at a time, so web apps need to be able to handle more
requests simultaneously than their desktop counterparts. Web application databases use distributed
architecture (multiple servers) to scale up quickly when demand increases, so they can handle large
numbers of simultaneous requests without slowing down or crashing.
Ease of maintenance for IT staff

Because web application databases use distributed architecture, problems can be isolated and fixed
quickly, which reduces downtime for the end user and reduces costs for IT staffs responsible for
maintaining the system. Also, with database automation tools we can make database tasks easier and
safer.

Types of Databases in Web Application


A database is a collection of records, each of which is similar to other records in the same database.
There are two types of databases: relational and non-relational. Relational databases are built on the
principles of tabular data, which means there has to be a one-to-one relationship between the columns
and rows in the table. A non-Relational Database is also known as NoSQL Database.

Relational

A database is a large collection of structured data, which can be accessed to find specific information.
Relational databases are famous for their structure and have been used by programmers for years.

List of Popular Web App Databases


Many different types of databases exist, with different features and capabilities. Some databases are
relational (or SQL-based), while others are non-relational (NoSQL). These are the best databases for
web applications. Depending upon your needs choose a right database to build your software
appplications.

MySQL (Relational)

MySQL is a relational database management system (RDBMS) based on SQL. It is a popular database
server, and a multi-user, multi-threaded SQL database. MySQL is developed by Oracle Corporation.
The name "MySQL" is a play on the name of co-founder Michael Widenius's earlier project, Monty
Python's Flying Circus. It is written in C and C++ programming languages, with some elements
written in Java. It has been licensed under GPLv2 since 2004, but it can be used under the terms of the
GNU Affero General Public License.

MySQL database is often used for data storage, especially in web applications, and it is also widely
used for creating and maintaining relational database tables. MySQL is owned by Oracle Corporation
and was developed by a Swedish company called MySQL AB, which was bought by Sun
Microsystems in 2008. As of 2009, the project is managed by Oracle Corporation.

It has become the most popular open source and best database software in the world, used on the web
and mobile applications, by corporations large and small and across all industries.

PostgreSQL (Relational)

An object-relational database management system that supports SQL-based queries, similar to those
used by other RDBMS systems such as MySQL or Oracle Database. PostgreSQL is developed and
maintained by PostgreSQL Global Development Group, which is made up of several companies and
individuals who have contributed code to the project over time.

PostgreSQL's developers do not require contributors to sign a Contributor License Agreement (CLA).
The PostgreSQL license includes a clause requiring attribution of original authorship if it's not done
automatically by the contributor's revision control system.

The software is distributed under an ISC license, which allows anyone to use it for any purpose
without paying royalties or fees.

MongoDB (Non-Relational)

MongoDB is an open-source document-oriented database developed by MongoDB Inc. (formerly


10gen). The first version was released in 2009. It is written in C++ and provides a document-oriented
data model that can be queried using a JSON-like query language.

A document can be thought of as a virtual "sheet" or "document" in a spreadsheet application such as


Microsoft Excel or Google Sheets. A document contains multiple fields that may be similar to cells in
an Excel spreadsheet or cells in an Access database table. These fields can have different types: text,
numbers, dates, and so on.

MongoDB's development began in 2007 when its creators were working on software for the social
media website Facebook.com. They attempted to create a new kind of database that would be better
suited to the needs of web applications than traditional relational databases, but they found that
commercial offerings did not meet their requirements. As a result, they developed a prototype called
GridFS before founding 10gen to continue work on it as a product named MongoDB. In 2009, the
company changed its name to MongoDB Inc., and in February 2010 it released the first production
version of MongoDB.

Cassandra (Non-Relational)

Cassandra is an open-source database management system that runs on many servers, making it well-
suited for handling large amounts of data. It offers fast performance and can scale up to a petabyte of
data across multiple servers, making it useful for applications with high write-throughput
requirements.

Cassandra is built on the principles of Dynamo with the goal of addressing some of its problems. The
technology was developed at Facebook and released as an Apache Incubator project in 2009. It
graduated from incubation in June 2010 and became an Apache Top-level Project (TLP) in January
2012.

Cassandra's architecture is based on Dynamo, but differs from it significantly in its design details,
especially regarding consistency guarantees and failure detection mechanisms. In particular, Cassandra
does not provide strong consistency; instead, it aims to provide high availability by making it easy to
deploy multiple copies of the data across many hosts while tolerating failures at any one host. This
makes Cassandra a popular choice for internet startups that must scale quickly and cheaply.
Cassandra is a key-value store, but it has flexible data models, so you can use it to store virtually any
kind of data. You can also use Cassandra for full-text search, or even for storing graph data (although
there are better options for graph storage than Cassandra).

Neo4j (Graph database)

Neo4j is an open-source graph database management system that stores data in a native graph database
format. It's designed to store data and query it very quickly, making it ideal for applications that
involve complex relationships between entities. It uses the native graph data model to provide ACID
transactions, high availability, and indexing. It's used by many companies to power their critical
applications, including eBay and Walmart.

Unlike relational databases, Neo4j doesn't enforce a schema on your data. This makes it easier to build
applications that model real-world problems such as social networks or product recommendations.
You can create multiple nodes for the same entity without duplicating data or having to use foreign
keys. In addition, Neo4j allows you to add properties to existing nodes without having to create a new
table first. These features make Neo4j much more agile than traditional relational databases when
modeling complex relationships between entities with many attributes and relationships between them.

MariaDB (Relational)

MariaDB is a fork of the MySQL relational database management system intended to remain free
under the GNU GPL. MariaDB was forked in 2009 by some of the original developers of MySQL
when Oracle announced that it would no longer fully support the community-developed version of
MySQL in favor of a paid enterprise product.

The original developers of MySQL created MariaDB to provide a better development environment and
more robust performance. MariaDB strives to be compatible with MySQL and includes most of its
storage engines. However, not all features are supported in MariaDB Server so it is recommended that
you check for compatibility before using any feature that may be affected by a bug or limitation in
MariaDB Server.

MSSQL (Relational)

MSSQL databases are the core of Microsoft SQL Server. It is a relational database management
system (RDBMS), a special type of database software that is used to create, store and manipulate data
in an organized manner.

MSSQL can be used to build enterprise-level business solutions and applications. Regardless of the
platform or device your users are using, you can use MSSQL to create a centralized data store with a
single version of the truth. You can also use it to create a single source of truth for your data analytics
and reporting technologies, such as Power BI and Tableau.

Conclusion
The database is an integral part of any Web application or website. Whether it is used for storing data
in an easy-to-access manner or for maintenance, the database is going to play a role in the success of
your project and you can't overlook it. For those who are simply going to be accessing data, the
strength of the database will not matter much as long as it has all the functionality they need.
However, those who plan on using it or maintaining it should really explore why one database type
may work better than another. If a web app is going to run fast and efficiently with minimal downtime,
every consideration needs to be made so that bottlenecks do not occur. The success of your project
may depend on your choice of database.

 Python. Python is the most popular open-source, back-end web development language in 2023.
...
 PHP. PHP is an open-source scripting language. ...
 Java. Java is an object-oriented, platform-independent, and secured programming language. ...
 C# ...
 Ruby. ...
 Swift. ...
 Kotlin.

You might also like