Inspecting A Dataset
Inspecting A Dataset
Inspecting A Dataset
As a data analyst, you'll use data to answer questions and solve problems. When you analyze data
and draw conclusions, you are generating insights that can influence business decisions, drive
positive change, and help your stakeholders meet their goals.
Before you begin an analysis, it’s important to inspect your data to determine if it contains the
specific information you need to answer your stakeholders’ questions. In any given dataset, it may
be the case that:
The data is not there (you have sandwich data, but you need pizza data)
The data is insufficient (you have pizza data for June 1-7, but you need data for the entire
month of June)
The data is incorrect (your pizza data lists the cost of a slice as $250, which makes you
question the validity of the dataset)
Inspecting your dataset will help you pinpoint what questions are answerable and what data is still
missing. You may be able to recover this data from an external source or at least recommend to your
stakeholders that another data source be used.
In this reading, imagine you’re a data analyst inspecting spreadsheet data to determine if it’s
possible to answer your stakeholders’ questions.
The scenario
You are a data analyst working for an ice cream company. Management is interested in improving
the company's ice cream sales.
The company has been collecting data about its sales—but not a lot. The available data is from an
internal data source and is based on sales for 2019. You’ve been asked to review the data and
provide some insight into the company’s ice cream sales. Ideally, management would like answers to
the following questions:
1. What is the most popular flavor of ice cream?
2. How does temperature affect sales?
3. How do weekends and holidays affect sales?
4. How does profitability differ for new versus returning customers?
Download the data
You can download the data to follow along with this reading. To use the template for the sales data,
click the link below and select “Use Template.”
Link to template: Ice Cream Sales
OR
If you don’t have a Google account, you can download the spreadsheets directly from the
attachments below:
SalesByTemp
XLSX File
SalesByDay
XLSX File
SalesByFlavor
XLSX File
So, which is it? It’s probably a daily snapshot because there are 365 entries for temperature, and
multiple rows with the same temperature and different sales values. This implies that each entry is
for a single day and not a summary of multiple days. However, without more information, you can’t
be certain. Plus, you don’t know if the current data is listed in consecutive order by date or in a
different order. Your next step would be to contact the owner of the dataset for clarification.
If it turns out that temperature does affect sales, you’ll be able to offer your stakeholders an insight
such as the following: “When daily highs are above X degrees, average ice cream sales increase by Y
amount. So the business should plan on increasing inventory during these times to maximize sales.”
Question 3: How do weekends and holidays affect sales?
Next, you click on the sales tab to view the data about dates of sale. The sales sheet has two
columns and 366 rows of data. The column headers are date and sales. This data is most likely total
daily sales in 2019, as sales are recorded for each date in 2019.
You can use this data to determine whether a specific date falls on a weekend or holiday and add a
column to your sheet that reflects this information. Then, you can find out whether sales on the
weekends and holidays are greater than sales on other days. This will be useful to know for
inventory planning and marketing purposes.
Question 4: How does profitability differ for new customers versus returning customers?
Your dataset does not contain sales data related to new customers. Without this data, you won’t be
able to answer your final question. However, it may be the case that the company collects customer
data and stores it in a different data table.
If so, your next step would be to find out how to access the company’s customer data. You can then
join the revenue sales data to the customer data table to categorize each sale as from a new or
returning customer and analyze the difference in profitability between the two sets of customers.
This information will help your stakeholders develop marketing campaigns for specific types of
customers to increase brand loyalty and overall profitability.
Conclusion
When working on analytics projects, you won’t always have all the necessary or relevant data at
your disposal. In many of these cases, you can turn to other data sources to fill in the gaps.
Despite the limitations of your dataset, it’s still possible to offer your stakeholders some valuable
insights. For next steps, your best plan of action will be to take the initiative to ask questions,
identify other relevant datasets, or do some research on your own. No matter what data you’re
working with, carefully inspecting your data makes a big
Metadata is as important as the data itself
Data analytics, by design, is a field that thrives on collecting and organizing data. In this reading, you
are going to learn about how to analyze and thoroughly understand every aspect of your data.
Take a look at any data you find. What is it? Where did it come from? Is it useful? How do you know?
This is where metadata comes in to provide a deeper understanding of the data. To put it simply,
metadata is data about data. In database management, it provides information about other data
and helps data analysts interpret the contents of the data within a database.
Regardless of whether you are working with a large or small quantity of data, metadata is the mark
of a knowledgeable analytics team, helping to communicate about data across the business and
making it easier to reuse data. In essence, metadata tells the who, what, when, where, which, how,
and why of data.
Elements of metadata
Before looking at metadata examples, it is important to understand what type of information
metadata typically provides.
Title and description
What is the name of the file or website you are examining? What type of content does it contain?
Tags and categories
What is the general overview of the data that you have? Is the data indexed or described in a specific
way?
Who created it and when
Where did the data come from, and when was it created? Is it recent, or has it existed for a long
time?
Who last modified it and when
Were any changes made to the data? If yes, were the modifications recent?
Who can access or update it
Is this dataset public? Are special permissions needed to customize or modify the dataset?
Examples of metadata
In today’s digital world, metadata is everywhere, and it is becoming a more common practice to
provide metadata on a lot of media and information you interact with. Here are some real-world
examples of where to find metadata:
Photos
Whenever a photo is captured with a camera, metadata such as camera filename, date, time, and
geolocation are gathered and saved with it.
Emails
When an email is sent or received, there is lots of visible metadata such as subject line, the sender,
the recipient and date and time sent. There is also hidden metadata that includes server names, IP
addresses, HTML format, and software details.
Spreadsheets and documents
Spreadsheets and documents are already filled with a considerable amount of data so it is no
surprise that metadata would also accompany them. Titles, author, creation date, number of pages,
user comments as well as names of tabs, tables, and columns are all metadata that one can find in
spreadsheets and documents.
Websites
Every web page has a number of standard metadata fields, such as tags and categories, site creator’s
name, web page title and description, time of creation and any iconography.
Digital files
Usually, if you right click on any computer file, you will see its metadata. This could consist of file
name, file size, date of creation and modification, and type of file.
Books
Metadata is not only digital. Every book has a number of standard metadata on the covers and
inside that will inform you of its title, author’s name, a table of contents, publisher information,
copyright description, index, and a brief description of the book’s contents.
Data as you know it
Knowing the content and context of your data, as well as how it is structured, is very valuable in your
career as a data analyst. When analyzing data, it is important to always understand the full picture. It
is not just about the data you are viewing, but how that data comes together. Metadata ensures that
you are able to find, use, preserve, and reuse data in the future. Remember, it will be your
responsibility to manage and make use of data in its entirety; metadata is as important as the data
itself.