Database Analytics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Database

Analytics
Muhammad Faisal
Universitas YARSI
Data sets and Database
A data set is simply a collection of data. Marketing survey responses, a table of
historical stock prices, and a collection of measurements of dimensions of a
manufactured item are examples of data sets.

A database is a collection of related files containing records on people, places, or


things. Databases provide structure to data. The people, places, or things for which
we store and maintain information are called entities.
A database is usually organized in a two-dimensional table, where the columns
correspond to each individual element of data (called fields, or attributes), and the
rows represent records of related data elements.
Using Range Names in Databases
Using range names for data arrays (that is, rows, columns, or rectangular matrix
arrays) can greatly simplify many database calculations.

Excel also provides a convenient way to include all data in a row or column in a
function. For example, the range for column B is expressed as B:B; the range for
row 4 is expressed as 4:4, and so on. Therefore, =SUM(B:B) will calculate the sum
of all data in column B; in Figure 2.2, this will be the total sales of Product A. The
function =SUM(4:4) will calculate the sum of all data in row 4; in Figure 2.2, this will
be the sum of January sales for all products. Using these ranges is particularly
helpful if you add new data to a spreadsheet; you would not have to adjust any
defined range names.
Using Range Names in Databases (Cont’d..)
Data Queries: Tables, Sorting, and Filtering

Managers make numerous queries about data. A manager might be interested in


comparing the amount of financial assets as related to the number of months
employed, finding all loans of a certain type, or the proportion of low credit risk
individuals who own a home. To address these queries, we need to sort the data in
some way. Extracting a set of records having certain characteristics is called filtering
the data. Excel provides a convenient way of formatting databases to facilitate such
types of analyses, called Tables.
Data Queries: Tables, Sorting, and Filtering (Cont’d..)

Creating an Excel Table (use your own data)

First, select the range of the data, including headers (a useful shortcut is to select the first cell in the upper left
corner, then click Ctrl+Shift+down arrow, and then Ctrl+Shift+right arrow; on a Mac, use the Command key
instead of Ctrl). Next, click Table from the Tables group on the Insert tab and make sure that the box for My
Table Has Headers is checked. (You may also just select a cell within the table and then click on Table from the
Insert menu. Excel will choose the table range for you to verify.) The table range will now be formatted and
will continue automatically when new data are entered. Figure 2.4 shows a portion of the result. Note that the
rows are shaded and that each column header has a drop-down arrow to filter the data. If you click within a
table, the Table Tools Design tab will appear in the ribbon, allowing you to do a variety of things, such as
change the color scheme, remove duplicates, and change the formatting.
A useful option in the Design tab is to add a total row. In the Table Style Options group, check the box for
Total Row, and Excel will add a new row at the bottom of the table. If you click on any cell in the total row,
you can select the type of calculation you want from the drop-down box; this includes SUM, AVERAGE,
COUNT, MAX, and MIN.
Data Queries: Tables, Sorting, and Filtering (Cont’d..)
Data Queries: Tables, Sorting, and Filtering (Cont’d..)

Sorting Data in Excel

Excel provides many ways to sort lists by rows or column or in ascending or descending order and using
custom sorting schemes. The sort buttons in Excel can be found under the Data tab (see Figure 2.5; the Mac
Data tab ribbon is similar). Select a single cell in the column you want to sort on and click the “AZ down
arrow” button to sort from smallest to largest or the “AZ up arrow” button to sort from largest to smallest.
You may also click the Sort button to specify criteria for more advanced sorting capabilities.
Data Queries: Tables, Sorting, and Filtering (Cont’d..)
Pareto Analysis

Pareto analysis is a term named after an Italian economist, Vilfredo Pareto, who, in 1906, observed that a large
proportion of the wealth in Italy was owned by a relatively small pro-portion of the people. The Pareto principle is
often seen in many business situations. For example, a large percentage of sales usually comes from a small
percentage of customers, a large percentage of quality defects stems from just a couple of sources, or a large
percent-age of inventory value corresponds to a small percentage of items. As a result, the Pareto principle is also
often called the “80–20 rule,” referring to the generic situation in which 80% of some output comes from 20% of
some input. A Pareto analysis relies on sorting data and calculating the cumulative percentage of the characteristic
of interest.
Data Queries: Tables, Sorting, and Filtering (Cont’d..)
Data Queries: Tables, Sorting, and Filtering (Cont’d..)
Filtering Data
For large data files, finding a particular
subset of records that meet certain
characteristics by sorting can be tedious.
Filtering simplifies this process. Excel
provides two filtering tools: AutoFilter for
simple criteria and Advanced Filter for
more complex criteria. These tools are best
understood by working through some
examples.
Data Queries: Tables, Sorting, and Filtering (Cont’d..)

Database Functions

You are already familiar with basic Excel functions such as SUM, AVERAGE, COUNT, and so on. Database
functions start with a “D” (for example, DSUM, DAVERAGE, DCOUNT) and allow you to specify criteria that
limit the calculations to a subset of records in a database using the same format as the Advanced Filter. You
can find all database functions by selecting the category Database in the Insert Function dialog. For example,
the syntax for the DSUM function is DSUM(database, field, criteria). Database is the range that includes the
column labels; field is the column name that contains the values to sum, enclosed in quotation marks, or a
reference to the column name; and criteria is the range that specifies the records you want to sum (the
same format as in Figure 2.12).
Data Queries: Tables, Sorting, and Filtering (Cont’d..)
Data Queries: Tables, Sorting, and Filtering (Cont’d..)
Data Queries: Tables, Sorting, and Filtering (Cont’d..)
Logical Functions

Logical functions depend on whether one or more conditions are true or false. A condition is a
statement about the value of a cell, either numeric or text. Three useful logical functions in
business analytics applications are the following:

IF(condition, value if true, value if false)—a logical function that returns one value if the
condition is true and another if the condition is false
AND(condition 1, condition 2, . . . )—a logical function that returns TRUE if all conditions are
true and FALSE if not
OR(condition 1, condition 2, . . . )—a logical function that returns TRUE if any con-dition is
true and FALSE if not
Logical Functions
Logical Functions
Lookup Functions for Database Queries

Excel provides some useful lookup functions for finding specific data in a spreadsheet. These are
the following:
VLOOKUP(lookup_value, table_array, col_index_num, [range lookup]) looks up a value in the
leftmost column of a table (specified by the table_array) and returns a value in the same row
from a column you specify (col_index_num).
HLOOKUP(lookup_value, table_array, row_index_num, [range lookup]) looks up a value in the top
row of a table and returns a value in the same column from a row you specify.
INDEX(array, row_num, col_num) returns a value or reference of the cell at the intersection of a
particular row and column in a given range.
MATCH(lookup_value, lookup_array, match_type) returns the relative position of an item in an
array that matches a specified value in a specified order.
CHOOSE(index_num, value1, value2, . . . ) returns a value from a list based on the position in the
list, specified by index_num.
Lookup Functions for Database Queries
Lookup Functions for Database Queries (Cont’d..)
Lookup Functions for Database Queries
Excel Template Design

Many database queries are repetitive; think of customer service representatives who
must look up order information, prices, and so on. We may use the logical and lookup
functions that we studied in the previous section to create user-friendly Excel templates
for repetitive database queries. We did this in Example 2.14 for finding sales for a
specific month and product name. In this section, we discuss some design approaches
and Excel tools.

An Excel template would typically have input cells, result cells, and possibly
intermediate calculations. Templates should be “clean,” well organized, and easy to
use. Validation tools should be used to ensure that users do not inadvertently make
errors in input data. Visualizations, such as charts, which will be discussed in the next
chapter, can be included for additional insight.
Excel Template Design
Excel Template Design

Data Validation Tools

Excel provides various data validation tools to reduce the chances that users will
make a mistake. First, select the cell range for which data validation will be
applied, and then choose Data Validation from the Data Tools group on the Data
tab in Excel 2016, or select Validation from the Data menu on the a Mac.
Excel Template Design
Excel Template Design
Form Controls
Form controls are buttons, boxes, and other mechanisms
The most common form controls are
for inputting or changing data on spreadsheets easily that
can be used to design user-friendly spreadsheets. They ■ Spin button—a button used to increase or decrease
allow the user to more easily interface with models to enter a numerical value
or change data without the potential of inadvertently
■ Scroll bar—a slider used to change a numerical
introducing errors in formulas. To use form controls, you
value
must first acti-vate the Developer tab on the ribbon. Click
the File tab, then Options, and then Customize Ribbon. ■ Check box—a box used to select or deselect a
Under Customize the Ribbon, make sure that Main Tabs is scenario
displayed in the drop-down box, and then click the check
■ Option button—a radio button used to select an
box next to Developer (which is typically unchecked in a
option
standard Excel installation). You will see the new tab in the
Excel ribbon, as shown in Figure 2.26. If you click the Insert ■ List box—a box that provides a list of options
button in the Controls group, you will see the form controls
■ Combo box—a box that provides an expandable list
available (do not confuse these with the Active X Controls in
of options
the same menu!). On a Mac, select Excel 7 Preferences 7
View and check the box for Developer tab. The Mac ribbon ■ Group box—a box that can hold a group of controls
is different and shows graphics of the form controls.
PIVOT TABLES

Excel provides a powerful tool for distilling a complex database into meaningful information: PivotTables
(yes, it is one word!). PivotTables allows you to create custom summaries and charts of key information
in the data and to drill down into a large set of data in numerous ways. In this section, we will introduce
PivotTables in the context of databases; however, we will also use them in other chapters to facilitate
data visualization and statistical analysis of data sets.

To apply PivotTables, first, select any cell in the database and choose PivotTable under the Insert tab
and follow the steps of the wizard. We will use the Sales Transactions database (see Figure 2.18). Excel
first asks you to select a table or range of data; if you click on any cell within the database before
inserting a PivotTable, Excel will default to the complete range of your data. You may either put the
PivotTable into a new worksheet or in a blank range of the existing worksheet. Excel then creates a blank
PivotTable, as shown in Figure 2.30.

In the PivotTable Fields window on the right side of Figure 2.30 is a list of the fields that correspond to the
headers in the database. You select which ones you want to include, as row labels, column labels, values,
or a filter. You should first decide what types of tables you wish to create—that is, what fields you want for
the rows, columns, and data values.
PIVOT TABLES

You might also like