Assignment 0a Excel

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

INTRODUCTION TO BUSINESS ANALYTICS

Assignment 0.a, Excel


05-09-2022

This assignment goes through basic modeling in and several functionalities of Excel,
forming a basis for the rest of the course. The exercises below correspond to questions in
the Canvas quiz. The Excel file containing all the numerical answers and charts must be
submitted to the last question of the quiz.

A. Excel from Scratch


A.1 Floating-point Errors
Do not believe everything the computer tells you. Take a look at Equation (1).

f (p) = 10p + 10−p − 10−p − 10p = 0. (1)

(a) Investigate why Equation (1) does not always hold in Excel.
Open an Excel workbook and name the first sheet ‘A.1’ and program f (p) in two
ways. You may use figure 1 as an example on how to format your sheet. First, put
the value of p in C2, taking p = 3. Next, put the outcome of f (p) in C3, by filling in
the formula in C3 and referring to cell C2 to get the value of p. Second, put 10p in C5
and 10−p in C6, and calculate =C5+C6-C5-C6 in cell C4.

The two should be equal, but one of the two is not even 0. This kind of errors do not only
happen in Excel, but happen in any programming language like Java, Python, and R.
So, what is happening? Excel stores Double Precision Floating Point numbers, a.k.a.
Doubles. These are 8-byte variables, storing numbers with 15 decimal places. Even when
one sees a number with two decimals, the underlying value has 15 decimal places.
1 1
(b) Investigate 3 + 3 + 13 .
Put ‘=1/3’ in three cells below each other and make sure Excel interprets the cell as a
number1 . Select these cells and go to Home (Tab) > Number (Group) > Decrease Decimal
(Command), till all three cells show ‘0.3’. Adding these three cells together will result
in 1.0, although visually 0.3 + 0.3 + 0.3 = 0.9 6= 1.0. Showing that the underlying
value is used, regardless of the formatting of the cells.

Figure 1: A sketch on how one could format their sheet.


1
Excel may misinterpret the type of the cell, e.g. sometimes it sees numbers as dates. Select the cell

1
Assignment 0.a, Excel Introduction to Business Analytics

The way we have learned to store numbers, is in base-10, for instance, 13 = (3 × 100 ) +
(1×101 ). Computers store in binary, thus 42 = (0×20 )+(1×21 )+(0×22 )+(1×23 )+(1×24 ).
Likewise, fractions are stored as 0.75 = (1 × 2−1 ) + (1 × 2−2 ) + (0 × 2−3 ). The consequence
is that it is impossible to store the number 1/10 = 0.1 in binary form without a small error.
These are no errors or ‘bugs’, but well-known limitations.

A.2 Linear Expansion


The expansion of a bar, ∆L, is linearly correlated with the length of the bar and the
difference in temperature, ∆L = αL∆T . Here, α is the expansion coefficient, L denotes
the length of the bar at temperature T = 20◦ C, and ∆T is the difference in temperature.
This process is depicted in figure 2. Open a new sheet and name it ‘A.2’.

Figure 2: The expansion of a metal bar.

(a) Create an Excel table that depicts the difference in length of bars with lengths of
L = 0, 20, 40, . . . , 100 meters in the columns, at temperature differences of ∆T =
0, 5, 10, . . . , 75◦ C in the rows, with α = 12 × 10−6 . This expansion coefficient corre-
sponds with Concrete2 .
First, create the row and column headers with the values of ∆T and L. Next, cre-
ate a formula in the table that uses both absolute as well as relative references3 .
One should create a formula that can auto-filled the whole table, by dragging the
fill-handle. Make sure that there is the flexibility to fill in different values for the
expansion coefficient α, i.e., reserve a separate cell outside of the table to store the
value of α and reference to it in the formula. An example of how to format your file
is given in Figure 3.

Cell names make formulas readable. Give the cell containing the numeric value of α
the name Alpha and adjust the table formula to include this cell names. Lastly, copy the
table4 and paste its transpose next to it5 .

and go to Home (Tab) > Number (Group) > Number Formats (Command) and select the correct format from
the drop-down menu.
2
en.wikipedia.org/wiki/Thermal expansion
3
$B$2 is absolute row and columns, B$2 is relative columns and absolute rows, and $B2 is absolute
columns and relative rows.
4
Select the complete table, copy it on the clipboard by Ctrl+C, the Copy command in the Home tab, or
right clicking on cell to find Copy in the Edit Menu.
5
Select an empty cell and use Edit menu > Paste special > Transpose or clicking on the arrow below
Paste in the Clipboard group.

Page 2 of 8
Assignment 0.a, Excel Introduction to Business Analytics

Figure 3: A sketch on how one could format their sheet.


A.3 Fibonacci sequence
The Fibonacci sequence is a beautiful example of a recursive definition, as each element is
determined by previous elements. It was first described in 1202 by Leonardo Fibonacci and
is defined by Equation (2).

Ft = Ft−1 + Ft−2 , with t > 1, F0 = 0, F1 = 1 (2)


The Fibonacci sequence can be observed in the growth of a colony of bacterial cells. An
adult bacterial cell multiplies following a process called binary fission. In an hour the cell
reproduces itself into one adult and one non-adult new cell as a result. Each adult cell
reproduces itself, again and again, every hour. It takes exactly one hour for a non-adult
new cell to develop into an adult cell that can reproduce itself. This reproduction process
means that the cell population at a general time t is equal to the t-th Fibonacci number
Ft . So, Ft shows how many bacterial cells there are after t hours.
(a) Calculate Ft for t ≥ 0 and its logarithm Lt for t > 0.
Rename a new sheet to ‘A.3’. In this sheet, create a column t = 0, ..., 40. Add a
column Ft in which you calculate the Fibonacci numbers following Equation (2), in
which you start the recursion with F0 = F1 = 1. Then, add a column in which the
logarithm base 2 of the number of bacterial cells is calculated, Lt = log2 (Ft ). Use the
LOG function for this. An example of how to format your file is given in Figure 4.
(b) Calculate the golden ratio Rt for t > 1.
Add a column to the table, in which one calculates the ratio between the number of
bacterial cells at time epochs t and t − 1, Rt = Ft /Ft−1 . This ratio is known as the
golden ratio.

Page 3 of 8
Assignment 0.a, Excel Introduction to Business Analytics

Figure 4: A sketch on how one could format their sheet.


(c) Create a graph with appropriate axis titles in which both the logarithm and the ratio
are displayed against t.
A graph can be created by going to Insert (Tab) > Charts (Group) and choosing
the appropriate chart that fits your data. If you did not select data beforehand or
want to change elements, click on the graph > Chart Design (Tab) > Data (Group)
> Select Data (Command). Within Select Data, one can add data column by
column, by clicking Add and selecting the column name under Series Name and the
column data under Series Values. Furthermore, in the Select Data Source menu one
can edit or remove series and edit the horizontal x-axis. By clicking on the graph >
Chart Design (Tab) > Chart Layouts (Group) > Add Chart Element (Command),
to add an axis, title, or Legend.

Always make sure your graph has a relevant title and axis titles, and, if needed, a legend.

A.4 Numerical Integration


Many applications of mathematics involve calculating the integral of some function f (x),
Rb
F (x) = a f (x)dx, also known as the antiderivative. However, for some forms of f (x),
finding the antiderivative is impossible or f (x) is unknown as we only have certain values
that have been determined by experimental measurements. To still be able to determine

Page 4 of 8
Assignment 0.a, Excel Introduction to Business Analytics

the antiderivatve, one can approximate it using numerical integration.


As start, divide the interval [a, b] into n subintervals of length h = (b − a)/n with the
n + 1 points x0 = a, x1 = a + h, x2 = a + 2h, ..., xn = a + nh = b. Suppose that the values
of f (xi ) at these points are known, for instance, via experiments. We can now approximate
the antiderivative with numerical integration. We will look at the Trapezoid Rule, also
visualized in Figure (5).

Figure 5: Visualization of the Trapezoid Rule.


Rb
The Trapezoid Rule approximates a f (x)dx by using straight line segments between
two points, namely xi and xRi+1 . Following figure Figure (5), conclude that the area under
x
this straight line is Fi (x) = xii+1 f (x)dx ≈ h2 (f (xi ) + f (xi+1 )), which is the approximate of
the integral for that subinterval. The approximate for the Rwhole interval can be determined
b
by using the SUM function to sum F0 + F1 + ... + Fn−1 ≈ a f (x)dx.
R π/2 −x2 dx
(a) Approximate the antiderivative F (x) = −π/2 e with the Trapezoid Rule for both
n = 10 and n = 20.
Create a new sheet ‘A.4’ and make a table to fill in the start a and end b of the
whole interval, together with the number of subintervals n and the subinterval lengths
h = (b − a)/n. Use the function PI() for a and b. Below it, create a table with the
first column x, containing the values x0 , ..., xn , the second column the corresponding
value f (xi ). Check if Excel calculates f (xi ) correctly, one may need to use extra
brackets to ensure that −(x2 ) is as expected. The third column is the the Trapezoid
approximation F0 , ..., Fn−1 . Also create a total row in the bottom to calculate the
sum F0 + F1 + ... + Fn−1 with the SUM function. An example of how to format your
file is given in Figure 6.

Page 5 of 8
Assignment 0.a, Excel Introduction to Business Analytics

Figure 6: A sketch on how one could format their sheet.


B. Births in the United States
The data used in this section is from the National Center for Health Statistics of the United
States. It considers all live births in 2018. Due too the size of the original file (3,8 million
data records), a sample of 100,000 records was extracted randomly. A documentation file
is added to the assignment explaining the features.

B.1 Gender
Analyze the number of male against female babies born and give the result in percentages.
This exercise can be solved in multiple ways, choose one. One can use COUNTIF and COUNT
or use a Pivot Table. When one uses the latter, put the Pivot table on a new sheet which
should be named B.1. One can put any of the features in the Values field when constructing
the pivot table. Click on the chosen feature in the Values field and choose the Value Field
Settings to summarize the value field by Count. Moreover, choose to Show Values As ‘%
of Grand Total’ in the Value Field Settings.

B.2 Time
Calculate the number of babies born per hour.
Add a column to the dataset which calculates in which hour of the day the baby is born.
Use the following definition, a baby is born in hour ‘0’ when the birth time was between 0-59
and in hour ‘1’ between 100-159, etc. For this, one can, for instance, use the ROUNDDOWN or
INT function. Next, analyze the dataset using a pivot table on a new sheet called B.2. Set
the hour on the Rows and the delivery method on the Columns, counting the occurrences.

Page 6 of 8
Assignment 0.a, Excel Introduction to Business Analytics

B.3 Age
Create age groups by making bins of size m, 0 − m, m − 2m, ....
We want to gain insight in the age of the mother and father, by making age groups such as
20 − 24, 25 − 29. The group size is defined as m, such that we can change it easily. Create
a new table with two columns, the lower bound and upper bound of the age group. The
first column will be (0, m, m + m, ...) and the second column will be (m − 1, m + m − 1, ...).
If m = 5, this will result in (0, 5, 10, ...) and (4, 9, 14, ...). Add a third column creating the
labels of the group, thus 0 − 4, 5 − 9, .... This can be done with TEXTJOIN using " - " as
delimiter.
Count per age group how many mothers and separately how many fathers are in the
data.
Put the outcomes in the next two columns of your table. For this, use the COUNTIFS function
with two criteria. The function requests a Criteria range, which is the data column with
the actual age of the mothers or fathers. Moreover, the function requests the Criteria which
follow the syntax ">="&A1 and "<="&B1, with A1 and B1 references to the lower and upper
bound of the group. Note that in your worksheet, the references can be to other cells. Test
if it works to change m to different values.

Page 7 of 8
Assignment 0.a, Excel Introduction to Business Analytics

B.4 Age Combinations


Given the age group of the mother and the father, count how many times the age group
combinations occur.
We finish this analysis by using the INDEX and MATCH functions. Add two columns to the
dataset, one with the age group of the mother and one with the age group of the father.
With MATCH, match the parent’s age to the age group, by looking up the age in the lower
bound age group column you created in B.3, using match type 1, which makes MATCH find
the smallest value that is greater than or equal to the lookup value. With INDEX, search in
the column with age group labels you created in B.3, to the corresponding label. Now, one
can create a pivot table of the data, putting the age group of the mother as Rows and the
age group of the father as Columns. Put the pivot table on a new sheet called B.4. This
enables us to analyze which combinations of age groups are common. When one changes
m, refresh the pivot table, by right-clicking on it and choosing Refresh.

Page 8 of 8

You might also like