Module 3 - Data Analysis in Excel
Module 3 - Data Analysis in Excel
Module 3 - Data Analysis in Excel
• Data preparation
○ Excel tables
Content
Introduction to
Microsoft Excel
Introduction to Microsoft Excel
Microsoft Excel is a spreadsheet developed by Microsoft.
Since it’s release, it has become the industry standard for spreadsheets.
Once the data has been imported, it’s rows and columns will be shown on the
spreadsheet.
Data
preparation
Data preparation
To optimise your use of various data management tools, you must make your
data adhere to some basic standards. This is why we clean up and format data
in excel.
2. Click on the Format as table icon in the styles group in the home tab.
Excel will automatically pick the range of your dataset and recognise the headers in it
Click ok
3. You’ll notice the banded table style and the table design tab where you can change how the table looks
Table formulas
Converting excel data to tables helps prevent problems with cleaning, sorting and formatting
Let us examine the Table formulas worksheet in the data_prep file;
1. Create a new column New Comp. in column K. Notice the formatting
2. Calculate the new compensation which is the old compensation * the percentage increase.
=[Comp.]+[Comp.]*M1
Notice the formatting. =[Comp.]+[Comp.]*$M$1
3. Press enter. Notice what happens
Another method, name your table as data and create a column call it NC, use this formula
=data[Comp.]+data[Comp.]*$M$1 and see the difference on
Calculate the average new Calculate the average new Use the method in the box 1
compensation . compensation . & 2 to the left to calculate
the number of blank fields
Let the output be in cell O3. Let the output be in cell O4. in column H using the
countblank function
Use the table name in the Use the column range in the
formula. formula.
What do you notice about the What do you notice about the
formula notation style? formula notation style?
Sorting
You can sort data in excel using the sort button in the home tab or in the data tab.
The sort button in the data tab gives you more options.
Eg. People that work full time with a job rating of 5 in a particular department.
You can also filter by text. For example, filter for all the departments that contain the word
“service”
Task:
- Filter for all the people hired in the first quarter of 2019
- Filter for all the people hired full-time between 2010 and 2015
- Filter for all the people that have compensation between 70000 and 90000
- Filter for the top ten compensation amounts
Removing duplicates
Sometimes, data comes with duplicate rows
The remove duplicate function helps us to remove these duplicates rows by specific rows.
We will use the Eliminate duplicate tab in the duplicates file for this lesson.
The remove duplicates button is under data tools in the data tab.
Use this button to remove the duplicate rows in the dataset.
You can also identify the duplicate rows by inserting a column into the dataset.
In that column, write a nested IF AND function to check if there are duplicates
=IF(AND(B2:L2=B3:L3), “dup”, “unique”)
Note that this is an array function. You may need to use cntrl+shift+enter to execute this
command correctly.
Question slide
We will look at the Cell locking file for examples on this lesson
If Else statement
Excel If else statement returns an output based on multiple criterias
Vlookup is excel’s method of looking into another table and getting a match of a desired
value in the column.
Problem 1A
=VLOOKUP(B5,B5:D8,3)
Problem 1B
=VLOOKUP(B25,sale,5,FALSE)
Problem 1c
=VLOOKUP(Sheet1!B48,sale,5,FALSE)
=VLOOKUP(B48,sale,8,FALSE)
=VLOOKUP(B48,sale,10,FALSE)
Index & match
Index and Match serves as an alternative to vlookup.
Index and match are two formulas in excel that can be joined together to move data from
one workbook to another.
Copy the Product column from the Product table to the Sales table
Step 1- insert product column in sales table
Step 2- =INDEX(product[Product],MATCH(Sales[ProductID],product[ProductID],0))
Copy the Category column from the Product table to the Sales table
Copy the Segment column from the Product table to the Sales table
Copy the ManufacturerID from the Manufacturer table column to the Sales table
Copy the Manufacturer Name from the Manufacturer table to the Sales table
Copy the State from the location table to the Sales table
Excel syntax principles
When writing syntax in excel, it is important to follow these steps;
3. You do not have to type the function in full into your cell.
After typing = and the first few letters of your function, the function you want should be highlighted.
You can then press tab or enter to allow excel fill up the formula for you.
Countifs: to count the number of full time workers with a job rating of 5 and have been in the company more than 10
years. =COUNTIFS(B:B, "Full Time", G:G, "5", E:E, ">10")
Averageifs: to calculate the average compensation of full time workers in the dataset with a job rating of 5 and have
been in the company more than 10 years.
Maxifs: to calculate the Maximum compensation of full time workers in the dataset with a job rating of 5 and have
been in the company more than 10 years.
Minifs: to calculate the minimum compensation of full time workers in the dataset with a job rating of 5 and have
been in the company more than 10 years.
Central tendency
The most common operations in excel are the mean, median and the mode.
In Cell D2, type the syntax, =AVERAGE(A2:A12). You should get an average of 29.4545
In Cell D4, type the syntax, =MEDIAN(A2:A12). You should get a median of 28
In Cell D6, type the syntax, =MODE(A2:A12). You should get a mode of 28
The mean, median and mode are measure of central tendency and they give us an idea of the”center” of
the data.
They are affected by the skewness of data to varying degree.
In Cell D2, type the syntax, =MIN(A2:A41). You should get the minimum order
value of 1684.00
In Cell D4, type the syntax, =MAX(A2:A41). You should get the minimum order
value of 9932.00
Determining spread (Quartiles)
Quartiles divide your dataset into four segments;
The lowest quartile, Q1
The second quartile, Q2
The median
The fourth quartile, Q3
There are two ways to analyse quartiles in excel. The inclusive and exclusive method
Classwork:
The results in each method is different. Why?
Complete the calculations for the other quartiles
Running averages
Running averages look at the average of data over a progressive time range
A.
In cell C2, use the average function to calculate the average in cell B2 alone. Use the absolute
reference to make the average function always begin at cell B2.
Double click the fill handle to populate the other cells in column C.
B.
In cell D4, use the average function to calculate the average from January to March
Double click the fill handle to populate the other cells in column D.
Procedure:
Click on a cell containing the values
Go to the charts group in the insert tab, click scatter.
This shows a chart of the correlation between distance traveled to a store and the amount spent at
the store.
You can right click on the chart and select the format option to change the format of the chart.
Functions of dashboards