Lect 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

CSC – 426 Business Intelligence and

Analytics

Muhammad Bilal
Sr. Lecturer
Department of Computer Science
Bahria University, Karachi
Learning Objectives
Upon successful completion of this chapter, you will be able to:
• Overview of Business Intelligence Environment, Key Components.
• Business and Data Model.
• Data Sources in BI, ETL Process, Data Warehouse and Data Marts.
• BI System Architecture.
• BI Platforms and Applications.
What is BI Environment?
• A Business Intelligence (BI) Environment is a comprehensive system that
consolidates, processes, and analyzes data from various sources to
support business decision-making. It integrates technology, applications,
and processes to collect, store, and present business data.
• A collection of processes, tools and technologies helpful in achieving more
profit by considerably improving the productivity, sales and service of an
enterprise. Using BI techniques, corporate data is analyzed, organized and
converted into useful information. This information is then used to initiate
a profitable business action.
• A typical BI environment involves business models, data models, data
sources, ETL, tools needed to transform and organize the data into useful
information, target data warehouse, data marts, OLAP analysis and
reporting tools.
• Setting up a business intelligence environment not only rely on tools,
techniques and processes, it also requires skilled business people to
carefully drive these in the right direction.
Key Components of a BI Environment
• Data Sources:
– Includes databases, ERP systems, CRM systems, spreadsheets, and external sources like
social media and market data.

• ETL (Extract, Transform, Load):


– The process of extracting data from various sources, transforming it into a consistent
format, and loading it into a data warehouse.

• Data Warehousing:
– A central repository where integrated data from multiple sources is stored, structured
for querying and analysis.

• Data Analytics and Reporting Tools:


– Tools and technologies used to analyze data and generate reports. Examples include
Power BI, Tableau, and QlikView.

• Dashboards and Visualization:


– Interfaces that present data in a visually intuitive manner, enabling users to interact
with and explore data insights.
Business Intelligence Environment
Care should be taken in :

– understanding the business requirement,

– Setting up the targets,

– Analyzing and defining the various processes associated with these,


determining what kind of data needed to be analyzed,

– Determining the source and target for that data,

– Defining how to integrate that data for BI analysis and

– Determining and gathering the tools and techniques to achieve this goal.
Business Intelligence Environment
BI Environment (Cont.)
• The term business model refers to a company’s plan for
making a profit. It identifies the products or services the
business plans to sell, its identified target market, and any
anticipated expenses.

• Data Modelling is the process of creating a data model for the


data to be stored in a database. This data model is a
conceptual representation of data objects, the association
between different data objects, and the rules.

• There are three types of data modelling techniques for


business intelligence: Conceptual, Logical, and Physical.
Conceptual Data Model
• Conceptual data modeling examines the business’s operations, intending to create a
model with the most important parts (such as describing a store’s order system).
Essentially, this data model defines what data the system will contain.
• Example: Entity Relationship Diagrams.
Logical Data Model
• Logical data modeling examines business functions (e.g. manufacturing, shipping),
intending to create a model describing how each operation works within the whole
company. It also defines how a system should be implemented, by mapping out
technical rules and data structures.
• Example: Data Flow Diagram, Data Vaults.
Physical Data Model
• Physical data modeling examines how the database will actually be implemented,
intending to model how the databases, applications, and features will interact with
each other. Here, the actual database is created; the schema structure is developed,
refined, and tested. Data models generated should support key business operations.
• Example: Data Matrices.
Conceptual vs Logical vs Physical Data Model
Data Sources in BI
• Data sources are the backbone of any BI system, providing the raw
material required for analysis and reporting. They include a variety of
structured and unstructured data formats, originating from internal
business operations as well as external sources.
• A data source is the location where data that is being used originates
from.
• A data source may be the initial location where data is born or where
physical information is first digitized. Concretely, a data source may be a
database, a flat file, live measurements from physical devices or scraped
web data.
• Example: Imagine a fashion brand selling products online. To display
whether an item is out of stock, the website gets information from an
inventory database. In this case, the inventory tables are a data source,
accessed by the web application which serves the website to customers.
Types of Data Sources
• Internal Data Sources

– Operational Databases: Data collected from internal systems such as


CRM (Customer Relationship Management), ERP (Enterprise Resource
Planning), and other business applications that track daily operations.

– Transaction Data: Includes sales records, purchase orders, invoices,


and other transactional data that provide insights into business
processes and customer behavior.

– HR Systems: Information related to employee performance, payroll,


and workforce analytics.
Types of Data Sources
• External Data Sources

– Social Media: Data from platforms like Twitter, Facebook, and


LinkedIn, providing insights into customer sentiments and market
trends.

– Market Research: Data from surveys, research reports, and external


databases that provide industry benchmarks and competitor analysis.

– Public Data: Government databases, economic indicators, and other


publicly available information that can be used to supplement internal
data.
Types of Data Sources
• External Data Sources

– Social Media: Data from platforms like Twitter, Facebook, and


LinkedIn, providing insights into customer sentiments and market
trends.

– Market Research: Data from surveys, research reports, and external


databases that provide industry benchmarks and competitor analysis.

– Public Data: Government databases, economic indicators, and other


publicly available information that can be used to supplement internal
data.
ETL (Extract, Transform, Load)
• In business intelligence, an ETL tool extracts data from one or more data-sources,
transforms it and cleanses it to be optimized for reporting and analysis, and loads
it into a data store or data warehouse.
• Extract
– The first step in the ETL process, extraction, involves gathering data from
different data sources, which could include databases, flat files, spreadsheets,
ERP systems, or even web sources like social media.
• Transform
– Once the data is extracted, it needs to be cleaned, enriched, and formatted to
make it usable for analysis. This transformation step ensures that all data
follows a consistent structure and format, making it easier to integrate and
analyze.
ETL (Extract, Transform, Load)
– Common Transformation Tasks:
• Data Cleaning: Removing duplicates, correcting errors, and handling
missing data.
• Data Standardization: Converting data into a common format or
structure. For example, dates might be stored in different formats across
systems, and these would need to be standardized.
• Data Normalization: Structuring the data to eliminate redundancy and
improve efficiency, especially when integrating data from different
systems.
• Load
– The final step in the ETL process is loading the transformed data into a data
warehouse, data lake, or other storage solutions where it can be accessed for
analysis. The storage location is usually designed for optimized querying,
reporting, and analysis.
• ETL Tools
– Informatica PowerCenter, Talend, Apache Nifi, Microsoft SQL Server
Integration Services (SSIS), AWS Glue, Google Cloud Dataflow
Data Warehousing
• The process of collecting, storing, and managing large
volumes of data from multiple sources to support decision-
making and analysis.

• Data warehouses are designed to provide a centralized


repository for an organization’s data, enabling users to
perform complex queries, generate reports, and derive
insights.

• A process of transforming data into information and making


it available to users in a timely enough manner to make a
difference.

• It typically serves as the backbone for BI systems.

– Example: Amazon Redshift, Google BigQuery, Snowflake,


Microsoft Azure Synapse Analytics, IBM Db2 Warehouse,
Teradata, SAP Data Warehouse Cloud, Apache Hive.
Data Warehousing
Characteristics of a Data Warehouse
• Subject-Oriented
– organized around key business subjects (e.g., sales, customer data, finance) rather than
daily operations.
• Integrated
– Data is gathered from multiple heterogeneous sources (e.g., transactional databases,
ERP systems, CRM software) and integrated into a unified format to ensure consistency
across the organization.
• Non-Volatile
– Data is stable in a data warehouse. More data is added but data is never removed. This
enables management to gain a consistent picture of the business.
• Time-Variant
– Data warehouses store historical data, allowing for comparisons across different time
periods. This supports time-series analysis, helping businesses to identify patterns and
trends over time.
• Optimized for Read Operations
– Unlike operational databases (OLTP), which are optimized for transactional processing
(inserts, updates), data warehouses are optimized for read-heavy operations. This
means queries, reports, and analyses are processed quickly even when handling large
volumes of data.
Data Warehouse Architecture
Types of Data Warehouses
• On-Premises Data Warehouse
– An on-premises data warehouse is deployed, hosted, and managed within an
organization's own physical data center. The organization is responsible for
purchasing, maintaining, and upgrading hardware and software for the
warehouse.
– Organizations with sensitive or regulated data that cannot be stored in the
cloud (e.g. National Bank of Pakistan, Habib Bank Limited, Telecom
Companies, Government Agencies).
• Cloud-Based Data Warehouse
– A cloud-based data warehouse is hosted on cloud platforms like Amazon Web
Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure. It allows
organizations to store and analyze data without the need for on-premises
infrastructure.
– Startups and growing businesses that need to scale their infrastructure quickly
and without heavy upfront costs.
– E.g. Daraz, Sapphire Group.
Types of Data Warehouses
• On-Premises Data Warehouse
– An on-premises data warehouse is deployed, hosted, and managed within an
organization's own physical data center. The organization is responsible for
purchasing, maintaining, and upgrading hardware and software for the
warehouse.
– Organizations with sensitive or regulated data that cannot be stored in the
cloud (e.g. National Bank of Pakistan, Habib Bank Limited, Telecom
Companies, Government Agencies).
• Cloud-Based Data Warehouse
– A cloud-based data warehouse is hosted on cloud platforms like Amazon Web
Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure. It allows
organizations to store and analyze data without the need for on-premises
infrastructure.
– Startups and growing businesses that need to scale their infrastructure quickly
and without heavy upfront costs.
– E.g. Daraz, Sapphire Group.
Data Mart vs. Data Warehouse
Data Mart Data Warehouse

A large, centralized repository that stores


A smaller, focused subset of a data warehouse
integrated data from multiple sources across the
for a specific department or business function.
organization.

Department-specific (e.g., finance, sales) Enterprise-wide, covering all organizational data

Contains relevant data for a specific area, may


Combines data from multiple diverse sources.
pull data from a data warehouse.

Smaller in size, limited to a particular function Large in size, storing vast amounts of data from
or department. across the organization.

Tactical decision-making within a specific Strategic decision-making that requires data from
department. across the business.
From DWH to Data Marts
BI System Architecture
OLAP (Online Analytical Processing)
• OLAP is software for performing multidimensional analysis at high speeds on large
volumes of data from a data warehouse, data mart, or some other unified,
centralized data store.
– Example: Microsoft SQL Server Analysis Services (SSAS), Oracle OLAP, SAP BW
(Business Warehouse), IBM Cognos Analytics, MS Power BI.
Business Intelligence Platforms
• Business Intelligence (BI) platforms are comprehensive software solutions that
enable organizations to collect, process, and analyze large amounts of data.
• These platforms provide a centralized framework that supports decision-making by
offering tools for data integration, reporting, and analysis.
• A BI platform is a foundation for building custom BI applications, dashboards, and
reports, allowing businesses to make data-driven decisions.
• Common BI Platforms and Tools:
– Microsoft Power BI, Tableau, QlikView / Qlik Sense, SAP BusinessObjects,
Oracle BI, IBM Cognos, Looker.
Business Intelligence Applications
1. Finance and Banking

– Banks such as Habib Bank Limited (HBL), MCB are using BI tools for risk management, fraud
detection, and compliance with regulatory standards such as those from the State Bank of Pakistan
(SBP). Predictive analytics helps detect fraudulent transactions and monitor financial irregularities in
real time.

– Data analytics help banks identify high-value customers, track spending patterns, and improve
customer retention through targeted marketing campaigns.

2. Telecommunications

– Jazz, Telenor, and Zong, uses BI platforms to optimize their network infrastructure. BI helps in
analyzing call data, bandwidth usage, and network performance, which results in improved customer
experience and operational efficiency.

– Data-driven insights help telcos develop better pricing models, offer relevant products, and enhance
customer support.

3. Retail & E-Commerce

– Daraz and Foodpanda use BI tools to manage inventory levels and forecast demand. BI helps in
identifying fast-moving items, optimizing stock levels, and streamlining supply chain operations.

– E-commerce businesses utilize BI to analyze customer purchasing behavior, preferences, and


feedback. This enables personalized recommendations, targeted marketing, and improved product
offerings based on consumer data.
Business Intelligence Applications
4. Healthcare:
– Adopted by healthcare providers such as Shifa International and Aga Khan University
Hospital for monitoring patient outcomes, optimizing resource allocation, and tracking
patient history.
– Forecast patient admission rates and optimize hospital resources like staffing and
equipment. This is particularly useful during public health crises, such as the COVID-19
pandemic, where data-driven insights were essential for managing patient flow.
5. Manufacturing:
– Packages Limited and Atlas Honda use BI tools to monitor production processes, track
key performance indicators (KPIs), and optimize supply chain logistics.
6. Government and Public Sector:
– The Punjab Information Technology Board (PITB) uses BI for data-driven governance,
monitoring public service delivery, and improving transparency in government projects.
– BI is used in public health for tracking disease outbreaks, managing health data, and
allocating resources effectively. During the COVID-19 pandemic, BI dashboards provided
real-time data on infection rates, hospital bed occupancy, and vaccination drives.

You might also like