What Is Business Analytics?: Satinderpal Kaur MBA3 (D)
What Is Business Analytics?: Satinderpal Kaur MBA3 (D)
What Is Business Analytics?: Satinderpal Kaur MBA3 (D)
MBA3(D)
Page 1
Satinderpal Kaur
MBA3(D)
Business Analytics helps you make better, faster decisions and automate processes. It helps
you address the questions and ensure you to stay one step ahead your competition. Some of
the basic questions in retail environment could be:
DATA MODELLING
Data modeling is a process used to define and analyze data requirements needed to support
the business processes within the scope of corresponding information systems in
organizations. Therefore, the process of data modeling involves professional data modelers
working closely with business stakeholders, as well as potential users of the information
system.
According to Hoberman, data modeling is the process of learning about the data, and
the data model is the end result of the data modeling process (data(Data Modeling a
technique in which data is converted into easy way for decision making, easily
understandable, , a blueprint, data modeling is the process of learning about the data, and
the Data Model data model is the end result of the data modeling process)
For Example: A Company want to build guest house base)<end result>.They call building
architect (Data Modeler)who tells how to do it. Then he will tell what is required building
requirements(Business Requirement).then he build a plan how to do it blueprint(developed
data model).
In other words
Data modeling is the formalization and documentation of existing processes and events that
occur during application software design and development. Data modeling techniques and
tools capture and translate complex system designs into easily understood representations of
the data flows and processes, creating a blueprint for construction and/or re-engineering.
Page 2
Satinderpal Kaur
MBA3(D)
A data model can be thought of as a diagram or flowchart that illustrates the relationships
between data
There are several different approaches to data modeling, including:
1) Conceptual Data Model
2) Logical Data Model
3) Physical Data Model
1) Conceptual data model
A conceptual data model identifies the highest-level relationships between the different
entities. Features of conceptual data model include:
No attribute is specified.
Page 3
Satinderpal Kaur
MBA3(D)
Foreign keys (keys identifying the relationship between different entities) are
specified.
Page 4
Satinderpal Kaur
MBA3(D)
Physical considerations may cause the physical data model to be quite different from
the logical data model.
Page 5
Satinderpal Kaur
MBA3(D)
In This Example
Customer and Order are entities. The items listed inside each entity, such as Customer
Name are attributes of the entity. The line connecting Customer and Order is showing
the relationship between the two entities, specifically that a Customer can have 0 to many (or
any number of) orders.
ERDs can be used to model data at multiple levels of specificity, from the low-level physical
database model to mid-level logical database model, to the high-level business domain
model.
An ERD is a good choice if you have multiple concepts or database table and are analyzing
the boundaries of each concept or table. By defining the attributes, you figure out what
belongs with each entity. By defining the relationships, you figure out how each entity relates
to the other entities in your model.
2) Data Matrix
A Data Matrix provides more detailed information about the data model and can take a
variety of different forms. Typically a Data Matrix is captured in a spreadsheet format and
contains a list of attributes, along with additional information about each attribute. Some
common types of additional information that might be captured in a column in a data matrix
include the following:
Data Type
Allowable Values
Required or Optional
Sample Data
Notes
A Data Matrix is a good choice when its necessary to analyze detailed information about
each attribute in your data model. This information is often used to design and build the
Chandigarh Business School
Page 6
Satinderpal Kaur
MBA3(D)
physical database and so is needed by the data architect or database developer. A sample data
matrix is included with the Data Model sample in theVisual Model Sample Pack.
(there is a marketer, and his sales are growing and he want to know why, he is supplying
every state of india, only one information should not represent it no of factors will as sales is
increasing because of good quality lower price )
3) Data Mapping Specification
A Data Mapping Specification shows how information stored in two different databases
connect to each other. The databases are often part of two different information technology
systems which may be owned by your organization, your organization and a third party
vendor, or two cooperating organizations.
For Example, when I worked for an online job board company, we created a data mapping
specification to define how wed import job content from some of our bigger clients who did
not wish to manually input the details of each job using our employer portal.
Any time you are connecting two systems together through a data exchange or import, a data
mapping specification will be a good choice. A sample data mapping specification and
template are included in the Business Analyst Template Toolkit.
4) Data Flow Diagram
A Data Flow Diagram illustrates how information flows through, into, and out of a system.
Data Flow Diagrams can be created using a simple workflow diagram or one of two formal
notations listed in the BABOK Guide the Yourdon Notation or the Gane-Sarson Notation.
A Data Flow Diagram does not tell you much about what data is created or maintained by a
system, but it does tell you a lot about how the data flows through the system or a set of interconnected systems. A Data Flow Diagram shows the data stores, data processes, and data
outputs.
A Data Flow Diagram is a good choice if your data goes through a lot of processing, as it
helps clarify when and how those processes are executed. Then, each data store could be
modeled using an ERD and/or Data Matrix and each process using a Data Mapping
Specification. Samples of data flow diagrams in all three notations are included in the Visual
Model Sample Pack.
MUTLIDIMENSIONAL MODELING
Dimensional Data Model
Dimensional data model is most often used in data warehousing systems. This
is different from the 3rd normal form, commonly used for transactional (OLTP)
type systems. As you can imagine, the same data would then be stored
differently in a dimensional model than in a 3rd normal form model. To
Page 7
Satinderpal Kaur
MBA3(D)
understand dimensional data modeling, let's define some of the terms commonly
used in this type of modeling:
SCHEMA
1) Star Schema
In the star schema design, a single object (the fact table) sits in the middle and is radically
connected to other surrounding objects (dimension lookup tables) like a star. Each dimension
is represented as a single table. The primary key in each dimension table is related to a
foreign key in the fact table.
Sample star schema
Page 8
Satinderpal Kaur
MBA3(D)
All measures in the fact table are related to all the dimensions that fact table is related to. In
other words, they all have the same level of granularity.
A star schema can be simple or complex. A simple star consists of one fact table; a complex
star can have more than one fact table.
Let's look at an example: Assume our data warehouse keeps store sales data, and the different
dimensions are time, store, product, and customer. In this case, the figure on the left
represents our star schema. The lines between two tables indicate that there is a primary key /
foreign key relationship between the two tables. Note that different dimensions are not related
to one another.
2) Snowflake Schema
The snowflake schema is an extension of the star schema, where each point of the star
explodes into more points. In a star schema, each dimension is represented by a single
dimensional table, whereas in a snowflake schema, that dimensional table is normalized into
multiple lookup tables, each representing a level in the dimensional hierarchy.
Sample snowflake schema
Page 9
Satinderpal Kaur
MBA3(D)
3) Fact constellation
DATA MART
Page 10
Satinderpal Kaur
MBA3(D)
A data mart is a segment of a data warehouse that can provide data for reporting and analysis
on a section, unit, department or operation in the company, e.g. sales, payroll, production.
Data marts are sometimes complete individual data warehouses which are usually smaller
than the corporate data warehouse.It is an indexing and extraction system. Instead of putting
the data from all the departments of a company into a warehouse, data mart contains database
of separate departments and can come up with information using multiple databases when
asked.
IT managers of any growing company are always confused as to whether they should make
use of data marts or instead switch over to the more complex and more expensive data
warehousing. These tools are easily available in the market, but pose a dilemma to IT
managers.
Difference between Data Warehousing and Data Mart
It is important to note that there are huge differences between these two tools though they
may serve same purpose. Firstly, data mart contains programs, data, software and hardware of
a specific department of a company. There can be separate data marts for finance, sales,
production or marketing. All these data marts are different but they can be coordinated. Data
mart of one department is different from data mart of another department, and though
indexed, this system is not suitable for a huge data base as it is designed to meet the
requirements of a particular department.
Data Warehousing is not limited to a particular department and it represents the database of a
complete organization. The data stored in data warehouse is more detailed though indexing is
light as it has to store huge amounts of information. It is also difficult to manage and takes a
long time to process. It implies then that data marts are quick and easy to use, as they make
use of small amounts of data. Data warehousing is also more expensive because of the same
reason.
DATA WAREHOUSING
A data warehouse is a collection of data marts representing historical data from different
operations in the company. This data is stored in a structure optimized for querying and data
analysis as a data warehouse. Table design, dimensions and organization should be consistent
throughout a data warehouse so that reports or queries across the data warehouse are
consistent. A data warehouse can also be viewed as a database for historical data from
different functions within a company. This is the place where all the data of a company is
stored. It is actually a very fast computer system having a large storage capacity. It contains
data from all the departments of the company where it is constantly updated to delete
redundant data. This tool can answer all complex queries pertaining data.
DATA INTEGRATION
Page 11
Satinderpal Kaur
MBA3(D)
Data integration involves combining data from several disparate sources, which are stored
using various technologies and provide a unified view of the data. Data integration becomes
increasingly important in cases of merging systems of two companies or consolidating
applications within one company to provide a unified view of the company's data assets. The
later initiative is often called a data warehouse.
Probably the most well known implementation of data integration is building an enterprise's
data warehouse. The benefit of a data warehouse enables a business to perform analyses
based on the data in the data warehouse. This would not be possible to do on the data
available only in the source system. The reason is that the source systems may not contain
corresponding data, even though the data are identically named, they may refer to different
entities.
EXTRACT
The purpose of the extraction process is to reach to the source systems and collect the data
needed for the data warehouse.
Usually data is consolidated from different source systems that may use a different data
organization or format so the extraction must convert the data into a format suitable for
transformation processing. The complexity of the extraction process may vary and it depends
on the type of source data. The extraction process also includes selection of the data as the
source usually contains redundant data or data of little interest.
For the ETL extraction to be successful, it requires an understanding of the data layout. A
good ETL tool additionally enables a storage of an intermediate version of data being
extracted. This is called "staging area" and makes reloading raw data possible in case of
further loading problem, without re-extraction. The raw data should also be backed up and
archived.
TRANSFORM
Page 12
Satinderpal Kaur
MBA3(D)
Page 13
Satinderpal Kaur
MBA3(D)
DATA WAREHOUSE
DEFINITION
Different people have different definitions for a data warehouse. The most popular definition
came from Bill Inmon, who provided the following:
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile
collection of data in support of management's decision making process.
Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For
example, "sales" can be a particular subject.
Integrated: A data warehouse integrates data from multiple data sources. For example,
source A and source B may have different ways of identifying a product, but in a data
warehouse, there will be only a single way of identifying a product.
Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data
from 3 months, 6 months, 12 months, or even older data from a data warehouse. This
contrasts with a transactions system, where often only the most recent data is kept. For
example, a transaction system may hold the most recent address of a customer, where a data
warehouse can hold all addresses associated with a customer.
Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a
data warehouse should never be altered.
Chandigarh Business School
Page 14
Satinderpal Kaur
MBA3(D)
Page 15
Satinderpal Kaur
MBA3(D)
OLTP System deals with operational data. Operational data are those data involved in the
operation of a particular system.
Example: In a banking System, you withdraw amount from your account. Then Account
Number, Withdrawal amount, Available Amount, Balance Amount, Transaction Number etc
are operational data elements
In other words, the ability to analyze metrics in different dimensions such as time,
geography, gender, product, etc. For example, sales for the company is up. What region is
most responsible for this increase? Which store in this region is most responsible for the
increase? What particular product category or categories contributed the most to the increase?
Answering these types of questions in order means that you are performing an OLAP
analysis. Depending on the underlying technology used, OLAP can be broadly divided into
two different camps: MOLAP and ROLAP. A discussion of the different OLAP types can
be found in the MOLAP, ROLAP, and HOLAP section.
Page 16
Satinderpal Kaur
MBA3(D)
MOLAP:
This is the more traditional way of OLAP analysis. In MOLAP, data is stored in a
multidimensional cube. The storage is not in the relational database, but in proprietary
formats.
Advantages:
Excellent performance:
MOLAP cubes are built for fast data retrieval, and is optimal for slicing and dicing
operations.
Can perform complex calculations:
All calculations have been pre-generated when the cube is created. Hence, complex
calculations are not only doable, but they return quickly.
Disadvantages:
Page 17
Satinderpal Kaur
MBA3(D)
Page 18
Satinderpal Kaur
MBA3(D)
DATA MINING
DEFINITION OF 'DATA MINING'
A process used by companies to turn raw data into useful information. By using software to
look for patterns in large batches of data, businesses can learn more about their customers and
develop more effective marketing strategies as well as increase sales and decrease costs. Data
mining depends on effective data collection and warehousing as well as computer processing.
Page 19
Satinderpal Kaur
MBA3(D)
Data mining is a logical process that is used to search through large amount of data in order
to find useful data. The goal of this technique is to find patterns that were previously
unknown. Once these patterns are found they can further be used to make certain decisions
for development of their businesses.
Three steps involved are
1) Exploration
2) Pattern identification
3) Deployment
1) Exploration: In the first step of data exploration data is cleaned and transformed into
another form, and important variables and then nature of data based on the problem are
determined.
2) Pattern Identification: Once data is explored, refined and defined for the specific
variables the second step is to form pattern identification. Identify and choose the patterns
which make the best prediction.
3) Deployment: Patterns are deployed for desired outcome.
ADVANTAGES AND DISADVANTAGES OF DATA MINING
Data mining can aid direct marketers by providing them with useful and accurate
trends about their customers purchasing behavior.
Based on these trends, marketers can direct their marketing attentions to their
customers with more precision.
For example, marketers of a software company may advertise about their new
software to consumers who have a lot of software purchasing history.
In addition, data mining may also help marketers in predicting which products their
customers may be interested in buying.
Through this prediction, marketers can surprise their customers and make the
customers shopping experience becomes a pleasant one.
Retail stores can also benefit from data mining in similar ways.
Page 20
Satinderpal Kaur
MBA3(D)
For example, through the trends provide by data mining, the store managers can
arrange shelves, stock certain items, or provide a certain discount that will attract their
customers.
2) Banking/Crediting:
Data mining can assist financial institutions in areas such as credit reporting and loan
information.
For example, by examining previous customers with similar attributes, a bank can
estimated the level of risk associated with each given loan.
In addition, data mining can also assist credit card issuers in detecting potentially
fraudulent credit card transaction.
Although the data mining technique is not a 100% accurate in its prediction about
fraudulent charges, it does help the credit card issuers reduce their losses.
3) Law enforcement:
Data mining can aid law enforcers in identifying criminal suspects as well as
apprehending these criminals by examining trends in location, crime type, habit, and other
patterns of behaviors.
4) Researchers:
Data mining can assist researchers by speeding up their data analyzing process; thus,
allowing them more time to work on other projects.
DISADVANTAGES OF DATA MINING
1) Privacy Issues:
Personal privacy has always been a major concern in this country. In recent years,
with the widespread use of Internet, the concerns about privacy have increase
tremendously. Because of the privacy issues, some people do not shop on Internet. They
are afraid that somebody may have access to their personal information and then use that
information in an unethical way; thus causing them harm.
Although it is against the law to sell or trade personal information between different
organizations, selling personal information have occurred. For example, according to
Washing Post, in 1998, CVS had sold their patients prescription purchases to a different
company.
Page 21
Satinderpal Kaur
MBA3(D)
In addition, American Express also sold their customers credit care purchases to
another company.8 What CVS and American Express did clearly violate privacy law
because they were selling personal information without the consent of their customers.
The selling of personal information may also bring harm to these customers because
you do not know what the other companies are planning to do with the personal
information that they have purchased.
2) Security issues:
Although companies have a lot of personal information about us available online, they
do not have sufficient security systems in place to protect that information.
For example, recently the Ford Motor credit company had to inform 13,000 of the
consumers that their personal information including Social Security number, address,
account number and payment history were accessed by hackers who broke into a database
belonging to the Experian credit reporting agency.
This incidence illustrated that companies are willing to disclose and share your
personal information, but they are not taking care of the information properly. With so
much personal information available, identity theft could become a real problem.
Trends obtain through data mining intended to be used for marketing purpose or for
some other ethical purposes, may be misused.
Unethical businesses or people may used the information obtained through data
mining to take advantage of vulnerable people or discriminated against a certain group of
people.
In addition, data mining technique is not a 100 percent accurate; thus mistakes do
happen which can have serious consequence.
Page 22