Module 3 - Data Analysis in Excel

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Data analysis (Excel)

• Introduction to Microsoft Excel


○ The Excel interface
○ Reading data into Excel

• Data preparation
○ Excel tables

Content ○ Table formulas


○ Sorting
○ Filtering
○ Removing duplicates
● Cell locking
● If Else Statement
● Vlookup
● Index match
● Excel syntax principles
● Conditional arithmetic procedures

Content ● Central tendency


● Determining spread
○ Min and max
○ Quartiles
• Running averages
• Histogram
• Scatter plots
• Forecasting
• Elementary data visualisation
• Principles of data visualization

Content
Introduction to
Microsoft Excel
Introduction to Microsoft Excel
Microsoft Excel is a spreadsheet developed by Microsoft.

It features calculation, graphing tools, pivot tables, and a macro programming


language called Visual Basic for Applications (VBA).

Since it’s release, it has become the industry standard for spreadsheets.

Excel forms part of the Microsoft Office suite of software.


The Excel interface
Row and column nomenclature
Reading data into Excel
Microsoft excel allows the user to read in data of various types into the interface.
The most common types are Comma Separated Value (CSV), xlsx and text file.

Data is imported using the open tool in the file tab.

Once the data has been imported, it’s rows and columns will be shown on the
spreadsheet.
Data
preparation
Data preparation
To optimise your use of various data management tools, you must make your
data adhere to some basic standards. This is why we clean up and format data
in excel.

Let us examine the PoorDesign worksheet in the data_prep file;

The title/headers are in multiple lines


use command =CONCATENATE( , " ", )
Copy and paste special
There is an empty row
There is an hidden column…Can you find it?

There is inconsistent data formats in column F and H


- Home Tab >> under Number change format
Excel tables
Converting excel data to tables helps prevent problems with cleaning, sorting and formatting

Let us examine the Table Conversion worksheet in the data_prep file;


Before converting your data to table, you must…
1. Make sure the data is contiguous.
Select all the cells using control A
Use control . to move round the four corners of the data to check if there are empty cells or rows.
Do this step on the last worksheet we used. What do you notice?

2. Click on the Format as table icon in the styles group in the home tab.
Excel will automatically pick the range of your dataset and recognise the headers in it
Click ok

3. You’ll notice the banded table style and the table design tab where you can change how the table looks
Table formulas
Converting excel data to tables helps prevent problems with cleaning, sorting and formatting
Let us examine the Table formulas worksheet in the data_prep file;
1. Create a new column New Comp. in column K. Notice the formatting

2. Calculate the new compensation which is the old compensation * the percentage increase.
=[Comp.]+[Comp.]*M1
Notice the formatting. =[Comp.]+[Comp.]*$M$1
3. Press enter. Notice what happens
Another method, name your table as data and create a column call it NC, use this formula
=data[Comp.]+data[Comp.]*$M$1 and see the difference on
Calculate the average new Calculate the average new Use the method in the box 1
compensation . compensation . & 2 to the left to calculate
the number of blank fields
Let the output be in cell O3. Let the output be in cell O4. in column H using the
countblank function
Use the table name in the Use the column range in the
formula. formula.

What do you notice about the What do you notice about the
formula notation style? formula notation style?
Sorting
You can sort data in excel using the sort button in the home tab or in the data tab.

The sort button in the data tab gives you more options.

We will be using the Sorting file for this lesson.


It’s a great idea to make sure your list is in a table before sorting

To group the data by department, status and years of experience;

-Click the sort button in the data tab


- Add the columns you want to sort and the order
- Click ok
Filtering
Filtering helps you to see a selected section of your data
We will be using the Filtering file for this lesson.
With a filter, we can extract data that meet different criteria

Eg. People that work full time with a job rating of 5 in a particular department.

You can also filter by text. For example, filter for all the departments that contain the word
“service”

Task:
- Filter for all the people hired in the first quarter of 2019
- Filter for all the people hired full-time between 2010 and 2015
- Filter for all the people that have compensation between 70000 and 90000
- Filter for the top ten compensation amounts
Removing duplicates
Sometimes, data comes with duplicate rows
The remove duplicate function helps us to remove these duplicates rows by specific rows.

We will use the Eliminate duplicate tab in the duplicates file for this lesson.
The remove duplicates button is under data tools in the data tab.
Use this button to remove the duplicate rows in the dataset.

You can also identify the duplicate rows by inserting a column into the dataset.
In that column, write a nested IF AND function to check if there are duplicates
=IF(AND(B2:L2=B3:L3), “dup”, “unique”)

Note that this is an array function. You may need to use cntrl+shift+enter to execute this
command correctly.
Question slide

Which of these are true?


1. You can import images and videos into Excel
2. Microsoft Excel is the only application that can read spreadsheet data
3. All data must be turned into excel tables once imported
4. Excel tables cannot be formatted
Cell locking
Cell locking is the attribute of excel to pin a calculation or function to a specific cell during
analysis.

We will look at the Cell locking file for examples on this lesson
If Else statement
Excel If else statement returns an output based on multiple criterias

Using the IfElse file;


Sample
I. Calculate the Total score for each student Create a column Pass/Fail
=IF([Total]<50, "fail", "pass")
Ii. Create a column that returns
F for students that score below 29
E for students that score between 30 and 39
D for students that score between 40 and 49
C for students that score between 50 and 59
B for students that score between 60 and 69
A for students that score between 70 and above
Vlookup
Vlookup means Vertical lookup.

Vlookup is excel’s method of looking into another table and getting a match of a desired
value in the column.

I’ll explain…with the Vlookup file.

Problem 1A
=VLOOKUP(B5,B5:D8,3)
Problem 1B
=VLOOKUP(B25,sale,5,FALSE)
Problem 1c
=VLOOKUP(Sheet1!B48,sale,5,FALSE)
=VLOOKUP(B48,sale,8,FALSE)
=VLOOKUP(B48,sale,10,FALSE)
Index & match
Index and Match serves as an alternative to vlookup.

Index and match are two formulas in excel that can be joined together to move data from
one workbook to another.

We will use the Index-match file for this lesson.

Copy the Product column from the Product table to the Sales table
Step 1- insert product column in sales table
Step 2- =INDEX(product[Product],MATCH(Sales[ProductID],product[ProductID],0))
Copy the Category column from the Product table to the Sales table
Copy the Segment column from the Product table to the Sales table
Copy the ManufacturerID from the Manufacturer table column to the Sales table
Copy the Manufacturer Name from the Manufacturer table to the Sales table
Copy the State from the location table to the Sales table
Excel syntax principles
When writing syntax in excel, it is important to follow these steps;

1. Write the = sign


2. Start by typing the name of the function you want to perform (e.g; type SU and SUM will pop up)
Note that if there is no = sign, the function suggestion pane will not pop up.
Notice the difference in the two cells in the diagram

3. You do not have to type the function in full into your cell.
After typing = and the first few letters of your function, the function you want should be highlighted.
You can then press tab or enter to allow excel fill up the formula for you.

4. Your function must be followed by parenthesis “()”.


You will type what you want the function to do in the parenthesis/bracket.
Question slide

Which of these are true?


1. Hlookup and Vlookup perform the same function
2. Index and match are separate functions that can be joined together
3. Some excel formulas work without the = sign
4. None of the above
Conditional
arithmetic
procedures
Conditional arithmetic procedures
Conditional arithmetic procedures are arithmetic functions that use the IF clause
Most of the functions in excel can be found under the formula tab in excel
We will use the Con_arith file for this lesson.

Arithmetic functions include;


Countif: to count the number of full time workers in the dataset =COUNTIF(B:B, "Full Time")
Sumif: to calculate the total compensation of full time workers in the dataset
Averageif: to calculate the average compensation of full time workers in the dataset

Countifs: to count the number of full time workers with a job rating of 5 and have been in the company more than 10
years. =COUNTIFS(B:B, "Full Time", G:G, "5", E:E, ">10")
Averageifs: to calculate the average compensation of full time workers in the dataset with a job rating of 5 and have
been in the company more than 10 years.
Maxifs: to calculate the Maximum compensation of full time workers in the dataset with a job rating of 5 and have
been in the company more than 10 years.
Minifs: to calculate the minimum compensation of full time workers in the dataset with a job rating of 5 and have
been in the company more than 10 years.
Central tendency
The most common operations in excel are the mean, median and the mode.

We will be using the MeansAndMedian file for this section.

In Cell D2, type the syntax, =AVERAGE(A2:A12). You should get an average of 29.4545
In Cell D4, type the syntax, =MEDIAN(A2:A12). You should get a median of 28
In Cell D6, type the syntax, =MODE(A2:A12). You should get a mode of 28

The mean, median and mode are measure of central tendency and they give us an idea of the”center” of
the data.
They are affected by the skewness of data to varying degree.

Discussion: What do the other two types of modes do?


Activity: Change cell A11 to 31. Notice any changes in the output cells? Why do you think they changed?
Determining spread (Min & Max)
The most common operations in excel are the mean, median and the mode.

We will be using the MinMax file for this section


Remember to hold down the cntrl, shift key and down arrow key from cell A2 to
select up to cell A41 .

In Cell D2, type the syntax, =MIN(A2:A41). You should get the minimum order
value of 1684.00
In Cell D4, type the syntax, =MAX(A2:A41). You should get the minimum order
value of 9932.00
Determining spread (Quartiles)
Quartiles divide your dataset into four segments;
The lowest quartile, Q1
The second quartile, Q2
The median
The fourth quartile, Q3

There are two ways to analyse quartiles in excel. The inclusive and exclusive method

The inclusive method:


In Cell D7, type the syntax, =QUARTILE.INC(A2:A41,1). 1 indicates the first quartile
The exclusive method:
In Cell G7, type the syntax, =QUARTILE.EXC(A2:A41,1). 1 indicates the first quartile

Classwork:
The results in each method is different. Why?
Complete the calculations for the other quartiles
Running averages
Running averages look at the average of data over a progressive time range

We will be using the RunningAverage file for this section

A.
In cell C2, use the average function to calculate the average in cell B2 alone. Use the absolute
reference to make the average function always begin at cell B2.
Double click the fill handle to populate the other cells in column C.

How is this result calculated?

B.
In cell D4, use the average function to calculate the average from January to March
Double click the fill handle to populate the other cells in column D.

How is this result calculated?


Forecasting
Forecasting answers the question of what happens next based on data gathered on current trends

We will be using the forecast file for this section

Start with highlighting cells B2:B9 in the Trend worksheet


Drag the fill handle (the little green box at the tip of the last highlighted cell) down to row 13

Open the Forecast worksheet


Notice that the amount spent does not have a progression like the Trend sheet

To predict the amount a customer will spend if he travels 30 miles:


I. type 30 in cell D2
Ii. in cell E2, use the FORECAST.LINEAR function.
Where;
X = miles driven,
Known_ys = Dependent variable = Amount spent
Known_xs = independent variable = Distance driven
Histograms
A histogram is a chart that shows the number of values in given ranges.
Each range is called a bin. All bins have equal widths.

We will be using the Histogram file for this section


Procedure:
Click on a cell containing the values
Go to the charts group in the insert tab, click histogram.
You can right click on the chart and select the format option to change the format of the
chart.
Change the width of the bins.
Scatter plots
A scatter plots gives the relationship between two values.

We will be using the XYScatter file for this section

Procedure:
Click on a cell containing the values
Go to the charts group in the insert tab, click scatter.
This shows a chart of the correlation between distance traveled to a store and the amount spent at
the store.
You can right click on the chart and select the format option to change the format of the chart.

Note that correlation is not causation.


Scatter plots visualise numerical data types only

Task: Add a trendline to the chart. What does it mean?


Question slide

Which of these are true?


1. Running and normal averages produce different results
2. Running and normal averages produce the same results
3. The size of bins determine the number of bars in a histogram
4. All of the above
Elementary
data
visualisation
What is a dashboard?

- A dashboard is a visual display of the most important information needed to


achieve one or more objectives; consolidated and arranged on a single screen so
the information can be monitored at a glance. - Stephen few (Information
dashboard design)

Functions of dashboards

- They optimize our understanding of complex systems and business processes


through a consolidated format.
- To perform descriptive analysis
- To perform performance analysis in respect to time
Steps in creating a visual
- Create an excel table
- Create a pivot table
- Visualise with pivot charts
- Slicers and timers

What are the principles of visualisation?


Types of charts and how to create them

We will be looking at the questions in the


visualisation file, answering each question
by creating pivot tables and charts with
slicers.
Steps in creating a visual
Create an excel table
Create a pivot table
Visualise with pivot charts
Slicers and timers

What are the principles of visualisation?


Types of charts and how to create them

We will be looking at the questions in the visualisation file, answering each


question by creating pivot tables and charts with slicers.
Question slide

When creating a visual…


1. The design should be tailored to the analyst’s taste
2. The design should be tailored to the end user’s taste
3. Legends should be hidden
4. All of the above
THANK YOU

You might also like