SIA Romney Ch05
SIA Romney Ch05
SIA Romney Ch05
Introduction to Data
5 Analytics in Accounting
LEARNING OBJECTIVES
S&S implemented a new AIS built on a relational database system. They collect and store all
of the data for the accounting system in one integrated system. However, as S&S continues
to grow, Ashton realizes they are creating data at an increasing rate. In addition to the ac-
counting system data, they have implemented other systems that track such things as social
media, Internet traffic, and many other things.
Working with all of the different systems has become cumbersome. Ashton and his team
are increasingly receiving requests for different data analyses, and often these requests re-
quire the team to join the disparate data together. For example, to help them run the busi-
ness, management wants reports that integrate social media, physical store information,
past performance, vendor and customer information, and predicted trends for the future.
The external and internal auditors want reports related to following financial reporting rules
and other compliance requirements, often in real-time. It seems that everyone wants Ashton
to provide data, and they want it immediately.
At the same time data demands are increasing, other employees refuse to use the data
analyses that Ashton and his team currently prepare for them. These employees ignore these
162
Ppbig/123RF
analyses, saying they are hard to understand and they wish they could just do the analysis
themselves. Ashton’s team spends a great deal of time preparing these reports, so he is un-
certain why they are not having more impact and how he can allow employees to conduct
their own analyses.
Ashton realizes the company’s data needs are expanding rapidly and that he needs to come
up with a plan to deal with this challenge or the company will face serious growth and compli-
ance problems. He also realizes he needs to better understand how to produce useful data anal-
yses to help S&S succeed. Ashton has blocked out the next week to develop a plan to deal with
data at S&S. He plans to present to the owners a data plan that addresses the following issues:
Introduction
Data is proliferating at an exponential rate. This data proliferation is caused by increasing
computer processing power, increasing storage capacity, and increasing bandwidth. At present,
there is no sign of slowing for any of the accelerators. In the Internet of Things (IoT) world,
coupled with 5G bandwidth, more and more data will come from more and more sources.
What good is all this data? Can businesses leverage this plethora of data to exact insights
and competitive advantage? This chapter explores data analytics and accompanying toolsets
needed to turn this mountain of data into useful information.
Using data appropriately has become especially important for accountants. Accountants in
all different practice areas are using data in exciting ways. As a few examples:
● Auditors, both internal auditors and external auditors, can test full populations of trans-
actions rather than a small sample by using data analytics and automation. They are also
able to provide greater evidence the company is complying with accounting rules by ex-
amining a greater variety of data.
● Corporate accountants use data to make better decisions such as how to accurately cost
products and services. Increased data allows them to make more accurate assessments of
risks and to identify opportunities to preserve and enhance value.
● Tax professionals face regulators who use analytics to identify tax returns that are likely
too aggressive. Data analytics also allow tax professionals to provide more real-time esti-
mates of tax consequences of business decisions. This increases the ability of tax profes-
sionals to influence top management business decisions.
163
164 PART II DATA ANALYTICS
“The skill sets required for the auditor of the future are • Mindsets: analytics mindset, global mindset, growth
always evolving. To help you stay ahead of the curve, mindset, innovative mindset.
the Center for Audit Quality, in collaboration with the • Behaviors: professional skepticism, critical thinking,
leading public company accounting firms and the logic, lifelong learning, embraces challenges, adapt-
American Institute of CPAs, developed this roadmap ability to new situations, adaptability when interact-
to help students understand what they will need to be- ing with others, cultural awareness, curiosity, and
come successful auditors.” So begins the document leadership.
titled “Mindset, Behaviors, Knowledge & Skills Building • Knowledge: proficiency in accounting and auditing,
a Roadmap for the Auditor of the Future,” a thought technology, psychology, and communication.
paper meant to guide students and educators on what
The past reputation of accountants being “just bean
skill sets students who enter the auditing profession will
counters” no longer applies in the digital age. The busi-
need in the future. Although directed to future auditors,
ness world expects accountants to do and to be more than
this list mirrors advice given to other professional ac-
they ever have been in the past. To read the full document
counting groups. The document lists the following key
produced by the CAQ, go to https://www.thecaq.org/
mindsets, behaviors, and knowledge that future auditors
mindset-behaviors-knowledge-skills-building-roadmap-
should possess:
auditor-future.
CHAPTER 5 INTRODUCTION TO DATA ANALYTICS IN ACCOUNTING 165
A mindset is a mental attitude, a way of thinking, or frame of mind. Mindsets are pow- mindset - A mental attitude, a
erful collections of beliefs and thoughts that shape how you think and feel and what you do. way of thinking, or a frame of
mind.
In the accounting domain, the Center for Audit Quality (or CAQ)—a nonpartisan nonprofit
public policy organization focused on improving external audit quality—suggests one critical
mindset for future accountants to develop is the analytics mindset. The CAQ defines an ana- analytics mindset - A way of
lytics mindset as the “ability to visualize, articulate, conceptualize, or solve both complex and thinking that centers on the cor-
rect use of data and analysis for
simple problems by making decisions that are sensible given the available information [and] decision making.
ability to identify trends through analysis of data/information.” The Big 4 accounting firm EY
provides a more tractable definition of the analytics mindset used as a framework in this chap-
ter. According to EY, an analytics mindset is the ability to
● Ask the right questions.
● Extract, transform, and load relevant data.
● Apply appropriate data analytic techniques.
● Interpret and share the results with stakeholders.
This analytics mindset definition closely resembles the scientific method of asking a question,
gathering data, testing the data, and reporting the results of testing. This time-honored method of
knowledge acquisition has produced many of the technological advancements enjoyed in modern
society. This chapter guides you to develop an analytics mindset in an accounting context.
for accounting data. The standards are designed so that a base standard provides guidance for
all types of accounting data and then specific standards deal with the general ledger and sub-
ledgers. Best practices from these standards will be referenced throughout this text.
This chapter discusses each component of the ETL process separately. Before studying
each of these components, realize that in practice these components often are mixed together.
For example, a data analyst might transform data while extracting it from a system. Repetitive
ETL processes can be fully automated so the extracting, transforming, and loading data is
done entirely by a computer program in what appears to be a single, unified step. Learning
about each component is valuable, but as you gain experience, the separate components will
blend together when you work with data.
EXTRACTING DATA
Extracting data is the first step in the ETL process. This section presents the extraction process
as three steps: (1) understand data needs and the data available, (2) perform the data extraction,
and (3) verify the data extraction quality and document what you have done.
UNDERSTAND DATA NEEDS AND THE DATA AVAILABLE Before extracting data, data needs
should be carefully defined. This relates to the first step of the analytics mindset of asking the
right question. Defining the question well makes it easier to define what data is needed to ad-
dress the question. Without defining the data well early in the process, it is more likely that the
wrong data or incomplete data will be extracted. If this happens, the entire ETL process may
have to be repeated, wasting significant time and effort.
After defining the needed data, the next step is to understand the data itself, which entails
understanding things like location, accessibility, and structure of the data. Organizations have
many different approaches to storing data based on currency and frequency of access to the
data and the type and intended use of the data. Companies often organize their data by creat-
ing data warehouses, data marts, and/or data lakes. While several slightly different definitions
of these storage concepts are in practice, a data warehouse generally describes the storage of
structured data from many different sources in an organization.
structured data - Data that is Structured data refers to data that is highly organized and fits into fixed fields. Examples
highly organized and fits into include accounting data like a general ledger, data in a relational database, and most types
fixed fields.
of spreadsheet data. In contrast, unstructured data is data that has no uniform structure.
unstructured data - Data that Examples include images, audio files, documents, social media, tweets, emails, videos, and
has no uniform structure. presentations. In between these two, semi-structured data is organized in some ways but
semi-structured data - Data
is not fully organized to be inserted into a relational database. Examples include data stored
that has some organization in csv, xml, or JSON formats and various forms of streamed data (such as logs or machine-
but is not fully organized to generated operation data). Data warehouses typically store only structured data or data that has
be inserted into a relational
database.
been transformed into structured data.
Data warehouses can be massive as they collect data from multiple sources across the
organization. As of 2014, Facebook had a data warehouse with 300 petabytes of data in
800,000 tables. To put that in perspective, 1 petabyte is the equivalent of 500 billion pages
of standard typed text. Biologists estimate that the human brain can hold a maximum of
2.5 petabytes of data.
Given the immense size of data warehouses, it is often more efficient to process data in
data marts - Data repositories smaller data repositories holding structured data, called data marts. For example, a company
that hold structured data for a may design a separate data mart for all data in geographic regions like North America, South
subset of an organization.
America, and Europe; or companies can create data marts by function such as a sales and a
marketing data mart. The smaller size of the data mart makes it faster to access the data. It also
provides tighter internal control by making it easier to restrict user access to only data relevant
to their position.
data lake - Collection of Finally, a data lake is a collection of structured, semi-structured, and unstructured data
structured, semi-structured, and stored in a single location. When companies create data lakes, they typically attempt to add
unstructured data stored in a
single location. all data in the organization to the data lake as well as relevant data from outside the organiza-
tion. For a car manufacturing company, a data lake may contain structured data like financial
information but also unstructured data such as pictures of vehicles, movies of crash testing,
performance logs from their cars, and social media data about the company.
CHAPTER 5 INTRODUCTION TO DATA ANALYTICS IN ACCOUNTING 167
The size of data lakes can cause problems if they become so large that it allows important
data to become dark data. Dark data is information the organization has collected and stored dark data - Information the
that would be useful for analysis but is not analyzed and is thus generally ignored. Data lakes organization has collected and
stored that would be useful for
can also become data swamps, which are data repositories that are not accurately documented analysis but is not analyzed and
so that the stored data cannot be properly identified and analyzed. Data goes dark or turns into is thus generally ignored.
a data swamp for many reasons, including the organization not understanding the value of data
data swamps - Data repositories
analysis or not devoting sufficient resources to maintaining and analyzing the data. When top that are not accurately docu-
company officials have an analytics mindset, they are more likely to understand the value of mented so that the stored data
data analysis and devote sufficient resources so that less data goes dark and data lakes do not cannot be properly identified
and analyzed.
become data swamps.
Companies design their data warehouses, data marts, and data lakes in many ways. For ex-
ample, a company can design a data warehouse to connect to all sources of transaction or struc-
tured data and then create a data mart using the data from the data warehouse. Alternatively, a
company may design data marts to connect to all sources of transaction data and then choose
to aggregate all of the data mart data into a data warehouse. Typically, data warehouses are of
primary importance and thus are used to build data lakes. However, companies can also create
their data warehouses, data marts, and data lakes to all independently access sources of data. A
visual depiction of some of these alternative structures is show in Figure 5-1.
To properly extract data from various data repositories, it is important to understand the
design of the organization’s data lake, data warehouse, and data mart(s). Knowing the struc-
ture helps a user identify where the needed data resides and how to properly access it.
Typically, the best way to understand the structure of data is to consult the data dictionary.
The data dictionary should contain metadata, which is data that describes other data. Examples metadata - Data that describes
of data dictionary metadata is the number of characters allowed in different fields, the type of other data.
Data Mart
Data Lake
Data Mart
Data Mart
Unstructured Data
Sources
Structured Data Sources
Data Mart
Data Mart
Unstructured Data
Sources
168 PART II DATA ANALYTICS
characters allowed in fields (e.g., integer, text, date/time), and the format of data in a particular
field. Maintaining up-to-date and correct metadata in a data dictionary can help prevent data
from going dark and data lakes from turning into data swamps. Carefully examining the data
dictionary to gather an understanding of the data and the underlying objects represents an im-
portant part of the ETL process.
The previous discussion assumes the data for analysis resides within the organization.
An analysis sometimes requires data from outside of the organization. For example, com-
paring prices with a competitor requires the collection of the competitor prices, which may
be obtained by scraping price data from the Internet and adding it to the data environment.
Before extracting information from other sources, consider the moral, ethical, and legal (pri-
vacy and ownership) ramifications whenever data is gathered external to the organization.
EXTRACT THE DATA With a firm understanding of data needs and the location and proper-
ties of the data, you are prepared to extract the needed data. Organizations often have internal
controls that restrict access to different types of data. Data extraction may require receiving
data owner - The person or permission from the data owner. The data owner is the person or function in the organization
function in the organization who who is accountable for the data and can give permission to access and analyze the data.
is accountable for the data and
can give permission to access With permission from the data owner, the data will then need to be extracted into separate
and analyze the data. files or into a flat file. A flat file is a text file that contains data from multiple tables or sources
and merges that data into a single row. Flat files are often preferable to separate files because
flat file - Text file that contains
data from multiple tables or
it can make it easier and faster to analyze the data.
sources and merges that data When including data in a flat file, a delimiter (also called a field separator) needs to be
into a single row. used to distinguish fields on a single line. A delimiter is a character, or series of characters, that
delimiter - Character, or series
marks the end of one field and the beginning of the next field. The Audit Data Standards recom-
of characters, that marks the mends using a pipe delimiter, which is the vertical line between quotes “|”. Pipes make useful
end of one field and the begin- delimiters because they are rarely used in other contexts. Two common, but less effective delim-
ning of the next field.
iters are commas and tabs. Commas and tabs are less effective because they are used frequently
in standard writing and thus can confuse a computer program. The program may not know if the
comma or tab is meant to be a delimiter or part of the information contained in a field.
text qualifier - Two characters A text qualifier is two characters that indicate the beginning and ending of a field and tell
that indicate the beginning and the program to ignore any delimiters contained between the characters. For example, Microsoft
ending of a field and tell the
program to ignore any delimit- Excel and Microsoft Access use a quote at the beginning and the end of a field to indicate that
ers contained between the any delimiters contained in that field should be ignored.
characters. Table 5-1 provides three examples of delimiters and text qualifiers. The first example
shows a flat file that uses a pipe delimiter and quotes for text delimiters. When the file is im-
ported, the program can parse the data easily into the correct cells. In contrast, the second ex-
ample uses a comma delimiter and no text qualifiers. In this case, the extra commas contained
in the PerformanceReview field cause a problem when parsing the data into columns. The
computer program would try to create two extra columns, causing an error because there are
no column names for these erroneous columns. The third example fixes this problem by using
text qualifiers with the comma delimiter. In this case, the program can easily understand how
the data is to be parsed and placed in the appropriate column.
Two additional items to note in Table 5-1. First, often a program will only include text
qualifiers when they are necessary to aid in parsing. Note how text qualifiers are not included
in the header row. Many programs would not include text qualifiers for performance reviews
if no delimiters (i.e., commas) were contained in the field. Second, the use of a high-quality
delimiter, such as the pipe symbol, makes it less important to use text qualifiers.
It is also important to consider how or if each row will be uniquely identified. As explained
in Chapter 4, primary keys are used to uniquely identify a row of data in a relational database
table. Recall that a primary key is an attribute or combination of attributes that uniquely identi-
fies each row in a table. Including primary keys in a data extraction can be valuable if you need
to extract more data and merge it with a previous extraction.
VERIFY THE DATA EXTRACTION QUALITY AND DOCUMENT WHAT WAS DONE Once the
data has been extracted, it is best practice to verify that the extraction was complete, accurate,
and approved. Batch processing controls studied in Chapter 13 can be useful to verify the
quality of the data. For example, counting the number of records in the extracted data set and
CHAPTER 5 INTRODUCTION TO DATA ANALYTICS IN ACCOUNTING 169
comparing the count with a count of records in the source data provides some evidence that the
extraction is complete.
An additional verification step used by auditors is to reperform the data extraction for a
sample of records and compare the smaller data extract with the full data extract. If the data is
the same in both the sample and the full data extract, it provides evidence that the extraction
process was done correctly.
The final data extraction best practice is to create a new data dictionary containing all of the
information about the fields in the data extraction. Often this new data dictionary can be created
by copying items from the source file data dictionary and updating those fields based on any
changes made in the extraction process. It is good practice to include metadata that lists where
the data came from in this new data dictionary. When data added to the extraction are external to
the organization or were not previously defined in a data dictionary, it is important to accurately
define the data in the new data dictionary. This allows everyone who uses the data extraction
to correctly understand the data before they analyze it and helps avoid creating a data swamp.
TRANSFORMING DATA
Standardizing, structuring, and cleaning the data so that it is in the format needed for data
analysis is called the data transformation process. Given the amount of time spent on and
complexity of transforming data, Chapter 6 discusses this topic in detail. In this section, an
overview of the four step transformation process is provided. The four steps are:
1. Understand the data and the desired outcome.
2. Standardize, structure, and clean the data.
170 PART II DATA ANALYTICS
UNDERSTAND THE DATA AND THE DESIRED OUTCOME The first step in transforming data is
to understand the extracted data and the desired data outcome when the transformation process
is finished. If a data dictionary is not provided with the extracted data, the individual trans-
forming the data should create a data dictionary and make sure they understand the data they
obtained from the extraction process.
It is also important to understand the specifications for the transformed data, including
which file format the data should be in when transformed, which delimiters should be used,
what level of detail should be provided, how the data should be filtered, how the data should be
structured for analysis, and any other specifications needed for the loading process.
STANDARDIZE, STRUCTURE, AND CLEAN THE DATA This is generally the most time-
consuming part of the transformation process. Consider a small business that needs to perform
a relatively straightforward analysis comparing employee payroll expense from one year to the
next after implementing a new AIS. Each system outputs the data differently, aggregates the
data files at different levels, uses different delimiters, and uses different primary keys. The bulk
of time needed to perform the task is spent transforming the data to the point that the two years
of data can be properly joined for analysis.
Transforming data can involve a virtually endless combination of techniques to insure
the data is of high quality and is accurate and complete. Transforming data requires being re-
sourceful and creative in understanding and fixing all the issues that exist in the data.
VALIDATE DATA QUALITY AND ACHIEVEMENT OF DATA REQUIREMENTS Data can have er-
rors from when it is originally recorded. Given the complexity of the transformation process, it
is also easy to introduce errors during this process. Therefore, it is critical to validate the data
after it has been transformed to make sure the data is free from errors. The validation process
can be simple like performing a visual scan of the data or more complex such as performing an
audit of the data. Chapter 6 describes how to validate data quality.
Data may be of high quality, meaning all the errors are corrected, and still be useless if it
does not meet the requirements for the ETL process. A laboriously transformed data file about
customers is worthless if the intended analysis was supposed to be about vendors. Making sure
the data meets the data requirements is critical to a successful transformation process.
DOCUMENT THE TRANSFORMATION PROCESS The last step of the transformation process
is to once again update the data dictionary. Often different individuals extract, transform, and
load the data. Thus, documenting changes made between each step in the process is vital for
the next person in the process. Also, once data is transformed, it is often accessed by many
different people. Without high-quality documentation of the transformation process, these in-
dividuals may use the data inappropriately in their analysis.
LOADING DATA
Once the data has been structured and cleaned, it is ready to be imported into whatever tool is
used for analysis. If the data has been properly transformed, this process usually is relatively
quick and easy. However, there are a few important considerations when loading data.
First, the transformed data must be stored in a format and structure acceptable to the re-
ceiving software. Some data may need to be saved as a text file using delimiters. Alternatively,
the data may be placed into a schema such as XBRL or JSON or into a relational set of tables.
If the data is imported into a database that enforces referential integrity, for example, then it is
important to make sure the data is loaded in the correct sequence, so the referential integrity
rule is not violated.
Second, programs may treat some data formats differently than expected. It is important
to understand how the new program will interpret data formats. Because some data format-
ting is treated differently than expected, the Audit Data Standards recommend stripping out
CHAPTER 5 INTRODUCTION TO DATA ANALYTICS IN ACCOUNTING 171
FIGURE 5-2
Ernst & Young Foundation Recommended Data Analytics Skills
especially in teaching the complex power of spreadsheet software” and “core skills with
both structured and unstructured databases (SQL, MongoDB, Hadoop, etc.).”
● A first statistics course that covers programming with a statistics program like R, clean-
ing data, data visualization tools like Tableau SpotFire or Qlikview, and skills related to
exploratory data analysis (descriptive statistics, basic exploratory multivariate statistics).
● A second statistics course that expands coverage of the statistics software and deals with ad-
vanced topics like statistical inference, dealing with missing data or design issues, univariate
and multivariate regression, logistic regression, machine learning, and predictive tools.
Both firms agree that a critical skill required of all new hires is the ability to self-teach
new concepts, a desire to learn, and a desire to stay up on what is the “latest and greatest”
in the fast-changing world of technology. In addition, although there are lots of new and ex-
citing technologies like blockchain and artificial intelligence, the basics of working with a
spreadsheet program, understanding databases, and how to visualize data are of preeminent
importance. Similarly, mastery of basic techniques like understanding ratio analysis, querying
databases, and understanding basic statistics should not be overlooked before mastering more
advanced techniques.
INTERPRETING RESULTS
In theory, interpretation of the output of your analytic tool is straightforward—describe what
the analytics or visualizations are saying. However, in practice, interpreting results can be
much more complicated. Interpreting results requires human judgment. Often, humans make
mistakes or interpret results in erroneous ways.
One common way people interpret results incorrectly relates to correlation and causation.
Correlation tells if two things happen at the same time. Causation tells that the occurrence of
one thing will cause the occurrence of a second thing. Consider a company in Idaho selling
winter snow gear. In the hot month of June, they decide to launch a large marketing campaign
for the next six months to advertise a new line of snow gear, spending more money each month
to advertise. At the end of the year in the middle of the winter, the company compares the
CHAPTER 5 INTRODUCTION TO DATA ANALYTICS IN ACCOUNTING 173
advertising dollars and sales revenue for snow gear and finds a positive correlation—meaning
as more money is spent on advertising, more snow gear is sold. One might be tempted to sug-
gest that the marketing campaign caused snow gear sales. This may or may not be the case.
This data cannot determine whether snow gear sales were caused by or only correlated with
the marketing campaign. A competing explanation is that the increase in sales from June to
December was caused by the changing of seasons in Idaho from summer to winter. The com-
ing of winter may have caused more people to buy snow gear even without advertising.
A second common misinterpretation of results is noted in psychology research. Psychology
research provides evidence of systematic biases in the way people interpret results. One exam-
ple of a bias that influences interpretation of results is called confirmation bias—the tendency
of individuals to interpret evidence to support their desired belief or position. You might see this
in the context of a manager evaluating an employee’s performance. Suppose the manager has a
pre-existing desire to give the employee a positive rating because they are friends. The manager
is likely to interpret any information about the employee more positively than an independent
evaluator because of their pre-existing desire to rate their friend well. Thus, the employee may
receive an evaluation higher than merited because the information was not interpreted correctly.
You should strive to interpret results objectively, making sure you fully understand what
the results of analyses mean. This takes training and practice to do effectively.
SHARING RESULTS
Sharing data analytics results with others is often called data storytelling, or storytelling. Data data storytelling - The process
storytelling is the process of translating often complex data analyses into more easy to under- of translating often complex
data analyses into more easy
stand terms to enable better decision making. Storytelling can help simplify all of the com- to understand terms to enable
plexities that go into the process of gathering data, analyzing data, and interpreting data. better decision making.
To tell a successful data story, first remember the question that initiated the analytics pro-
cess. As part of generating an appropriate question, the story designer considers the objectives
of the stakeholder. The story should answer the questions and achieve the objectives of the
stakeholder. That does not mean the designer has to agree with or even support the objective,
but the designer should design the story to discuss and to address the objective.
A second component of effective storytelling is to consider the audience. Examples of
some things to consider about the audience include their experience with the particular data
and data analytics in general; how much detail they will desire to answer the question; will
the story be presented in person or via a report/email; how important is the question (i.e., how
much time will they devote to this question); and whether the stakeholder needs a “quick”
answer or an in-depth, detailed answer. Based on understanding what the audience wants, the
story designer can design a story to meet the stakeholder’s needs.
A third component is the use of data visualizations. Data visualization is the use of a data visualization - Use of a
graphical representation of data to convey meaning. A shorthand name for data visualizations graphical representation of data
to convey meaning.
used in practice is “viz” or “vizs.” A common way to display data vizs is with a data dash-
board, or dashboard for short. A data dashboard is a display of important data points, metrics, data dashboard - Display of
and key performance indicators in easily understood data visualizations. important data points, metrics,
and key performance indica-
Data visualization is a powerful way to communicate quickly and effectively. For ex- tors in easily understood data
ample, a CIO might be concerned that costs for new computer technology are skyrocketing. visualizations.
One less effective presentation technique might be to report to the CIO that computer technol-
ogy costs have increased 47% while all other expenses have increased 35% and sales have
only increased 14%. The same information can be included in a dashboard and more vividly
displayed in a line chart that shows computer technology costs, other expenses, and revenues
changing over time. This chart quickly highlights what the CIO cares about and is easier to
digest and remember than a wordy sentence.
Successful data visualization is both an art and a science. The art portion of visualization
relates to understanding how others will interpret the visualization, using creativity to convey
the message, and making a viz beautiful. The science portion of visualization relates to un-
derstanding basic principles of sound design. Good principles of visualization design include:
● Choosing the right type of visualization.
● Simplifying the presentation of data.
174 PART II DATA ANALYTICS
AUTOMATION
automation - The application Automation is the application of machines to automatically perform a task once performed
of machines to automatically by humans. For example, instead of manually copying and pasting data from a computer da-
perform a task once performed
by humans. tabase into another program, a computer program can be written that automatically performs
this task. Automation is often thought about in the context of manufacturing. Many automo-
tive companies now use robots to perform welding on an assembly line rather than have a
human perform this task. However, automation is not limited to manufacturing; it has been an
important part of accounting, and business more generally, for a long time. The first computer
systems that collected accounting data were programmed to automatically add up the credit
amounts and debit amounts of journal entries to make sure they balanced. This replaced hu-
mans who previously performed this task.
Business automation ranges on a spectrum from very basic to very complex. Very
basic automation requires a very defined process so that a programmer can design all of the
logic needed to perform every step of the task. Often, this type of automation is carried out
robotic process automation with robotic process automation (RPA) software, which is computer software that can be
(RPA) - Computer software that programmed to automatically perform tasks across applications just as human workers do. A
can be programmed to auto-
matically perform tasks across person using RPA designs an RPA bot, which is an autonomous computer program designed
applications just as human to perform a specific task. Rather than creating the bot using a scripted computer programming
workers do. language, many of the leading providers of RPA software, such as Automation Anywhere,
bot - Autonomous computer
BluePrism, and UIPath, allow users to build bots with a simple click-and-drop interface. At
program designed to perform a the other end of the automation spectrum is automation that can recognize patterns, learn over
specific task. time, and perform much more complex tasks. This type of automation is performed using tools
like machine learning, artificial intelligence, and cognitive computing.
While some companies are working to create advanced automation solutions, most com-
panies are still working to automate tasks on the basic automation side of the spectrum. For
example, recent research finds that one of the Big 4 accounting firms automated 1 million
human hours of tasks in 2017 and that the accounting firm planned to increase this amount 10
times in the next three years. These automated tasks will never again be performed by humans,
but rather only by the “digital employees”—the bots. Indeed, a vice president of the RPA com-
pany UIPath said their goal “is to have a bot on every desktop” expressing the desire that RPA
should be used by every employee to free the employee from doing mundane, routine work.
Companies are using RPA and other automation software to automate tasks within their
analytics processes. For example, in relation to asking the right question, the company Aera
Technology has developed technology to create a “self-driving enterprise.” That is, the tech-
nology tries to “make real-time recommendations, predict outcomes, and take action autono-
mously” or said differently, it tries to develop questions the business should be asking, answer
them without requiring human interaction, and perform the appropriate response. As an ex-
ample, the technology might examine past sales and predict that a particular store may have a
shortage of an item and then automatically ship the item from whichever storehouse would be
least expensive.
Professionals actively look for ways to automate the ETL process. RPA is one tool that can
be used to automate ETL tasks. For example, a student intern working for a small accounting
CHAPTER 5 INTRODUCTION TO DATA ANALYTICS IN ACCOUNTING 175
firm was tasked with consolidating tax data for 420 C-corporation tax returns. The task nor-
mally took previous interns a total of 430 hours to complete, with 80 of those hours checking
for human data entry errors. The intern spent 35 hours designing a bot to perform the consoli-
dation task. When finished, the bot ran overnight and completed the entire task with no errors.
Moving forward, the annual task can be completed in one night with the click of a few buttons
and require virtually no human hours of work.
Related to automating the data analysis portion of an analytics mindset, as of 2019, Tableau,
the software visualization company, built a new “Ask Data” feature that allows the user to type
in a question and the computer automatically analyzes the data and produces a visualization that
tries to answer the question. The user does not need to know how to use the program to build a
visualization—the program automates the analysis and production of results.
Finally, the automation of sharing results with others is often performed with the creation
and continual update of data dashboards. The process of producing the data dashboard is typi-
cally automated so a user can review real-time data at any time.
Although automating activities can often improve efficiency and effectiveness, not all activi-
ties are candidates for successful automation. Activities that are best for basic automation are those
that are frequently performed, time-consuming, repetitive, rules based, and stable. Automating
frequent, time-consuming, repetitive tasks frees workers to focus on more value-adding tasks.
Be aware that there are risks and concerns related to automation. If a task is automated
incorrectly, the automation will efficiently enter incorrect data or alter data to be incorrect. It
is critical to make sure automation works exactly as desired on training data before activating
it on live data. Similarly, the automation needs to be periodically reviewed. If the task changes
and the automation is not updated, the automation can again cause problems.
In February 1996, “the match that changed [chess] his- make decisions than humans. However, the story suggest-
tory” was held between reigning world chess champion ing computer domination of humans does not end there.
Garry Kasparov and Deep Blue, an IBM supercomputer. After losing to Deep Blue, Kasparov started a chess tour-
Kasparov came into the match as the heavy favorite—some nament where “anything goes,” meaning you could enter
claim he is the greatest chess grandmaster of all time. He the competition as an individual human, automated com-
also had history on his side: A chess-playing computer had puter program, or teams that combine any combination of
never beat a reigning world champion under normal chess humans and computers. The fascinating outcome of these
tournament conditions. In the first game, the unthinkable tournaments was that a human or a computer did not
happened, Deep Blue beat Kasparaov. Kasparov went on strictly dominate, rather the winners of these tournaments
to beat Deep Blue 4 games to 2, but the initial loss set were usually the combination of a human and a com-
in motion the end of human dominance in chess. In a re- puter program. In describing this winning combination,
match the following year, Deep Blue beat Kasparov 3-½ Kasparov introduced the idea of digital/human centaurs
games to 2-½ games. (a centaur is a half-human and half-horse mythical crea-
Since the Kasparov vs. Deep Blue chess game, auto- ture). Kasparov found that half-human and half-computer
mated computer programs have beat human experts in “centaurs” were the best combination for winning chess
other tasks, including IBM’s Watson beating the all-time competitions.
winningest Jeopardy contestant Ken Jennings and Google’s The business world readily embraces this “centaur”
AlphaGo beating one of the world’s most dominant players, approach. Accounting firms are beginning to refer to their
Lee Sedol, in the game Go. This last feat is especially im- human employees and their digital employees. Digital
pressive as Go has so many possible different moves that employees are computer programs that complete a
it is not possible to program a computer to try every com- task, which was often previously performed by a human.
bination to win the game, like programmers often program Business teams are often made up of different experts,
the computers to do when playing a human in chess. with a technology and data expert included to harness the
What is the common thread among these different power of technology. All trends point to a future where
games pitting human vs. machine? In these cases, com- humans and computers will interact to perform business
puters are programmed to better analyze and use data to tasks better than either can do alone.
176 PART II DATA ANALYTICS
Finally, the human element of automation should be considered. Many employees are con-
cerned when a company implements automation because they worry they will lose their jobs.
Automation can be used to reduce headcount, but it also can be used to reduce boring, repetitive
work so that employees are freed to do more interesting, value-added tasks. An organization
needs to consider the human response to automation before beginning the automation process.
KEY TERMS
AIS in Action
CHAPTER QUIZ
1. Unstructured data internal or external to the organization is usually gathered and stored in
which of the following?
a. data dictionary c. data mart
b. data lake d. data warehouse
2. Which one of the following items would be the best primary key for a table containing
information about customers?
a. customer ID c. customer phone number
b. customer full name d. customer email address
3. Which one of the following characters would be the best delimiter (the delimiter is listed
between the quotes)?
a. “,” c. “|”
b. “@” d. All of the above
4. An online sales company designed a program to evaluate customer purchases. After each
purchase, the program analyzes which product the customer is most likely to buy next and
e-mails the customer a coupon for a discount on this new product. What type of analytics
is this an example of?
a. descriptive analytics c. predictive analytics
b. diagnostic analytics d. prescriptive analytics
5. When sharing the results of an analysis, which of the following is NOT a key principle to
follow?
a. Simplify the presentation of data.
b. Present the visualization in a timely manner.
c. Ethically represent the data.
d. Emphasize what is important.
6. Which of the steps of an analytics mindset is the most difficult to automate?
a. Ask the right questions.
b. Extract, transform, and load relevant data.
c. Apply appropriate data analytics techniques.
d. Interpret and share the results with stakeholders.
7. All of the following characteristics of data are important in distinguishing big data from
regular data EXCEPT:
a. velocity c. visualization
b. variety d. volume
8. You are given an extract of one field from a database. The field has the value “11815 N.
Diamond Dr.” Which type of data is contained in this field?
a. structured data c. semi-structured data
b. unstructured data d. None of the above
178 PART II DATA ANALYTICS
DISCUSSION QUESTIONS
5.1 The first step of an analytics mindset is to ask the right questions. How do you learn
how to ask the right questions? How would you teach someone else how to ask the right
questions?
5.2 This chapter discusses several different ways to structure data warehouses, data marts,
and data lakes. Discuss the diagrams listed in the book or diagram your own structures
for data warehouses, data marts, and data lakes, and discuss the pros and cons of each
structure.
5.3 Companies are automating many accounting tasks. Is automation good or bad? Consider
this question from the view of accounting students, accounting practitioners, other busi-
ness professionals, and society as a whole. What should be done to achieve the good
aspects of automating accounting tasks while minimizing the poor aspects?
5.4 The end of this chapter suggests that data analytics are not always appropriate for a deci-
sion context. Identify three unique business situations for which data analytics may not
be appropriate. Identify why data analytics are not appropriate in these situations and
how a decision maker should make their decision without using data.
PROBLEMS
5.2 For each of the following examples, indicate whether the data is structured, semi-
structured, unstructured, or a mix of each. Explain your answer.
1. A company runs many social media campaigns to increase sales. The company col-
lects data about the amount spent on each ad campaign, the number of people who
click on each ad, whether each person clicking on an ad completed a purchase, and the
location (city and country) of each person who clicked on an ad.
2. A company performs performance evaluations of all its employees each quarter. The
evaluations include comments made by peers of each employee, a supervisor’s write-
up of performance during the quarter with a rating on a 5-point scale, and perfor-
mance metrics relative to their job title (e.g., sales completed for sales people, units
repaired for repair people, etc.).
3. A call center records all phone calls between employees and customers. The company
stores the data for review if any allegations are made of inappropriate employee behavior.
4. A company scrapes data from a review website where customers can write in about
products they have purchased. The company analyzes each of the reviews but only
records the number of words in the review, a rating of the tone of the review (scores
from -3 to +3), and the number of stars given (1 to 4).
5. A university tracks student’s course registrations each semester. The university re-
cords the course number, course description, and course credit hours for each student.
6. A mechanic keeps a digital catalog of all part numbers and part descriptions for each
type of vehicle the company services.
7. A non-profit organization keeps a record of all past donors. The organization tracks
names, dates of donations, amount donated, and additional comments about the donor
and their donation.
180 PART II DATA ANALYTICS
8. An online retailer tracks IP addresses from each web visit. The retailer monitors IP
addresses to see if visits are coming from IP addresses known to hack company websites.
9. A company scrapes data from a review website where customers can write in about
products they have purchased. The company stores each of the written reviews.
10. A company owns a football stadium. During games, the company takes high-
definition photos of all fans. The company stores these images and plans eventually
to use advanced technologies to see which fans wear the team’s colors so they can
market clothing to them.
5.3 Consider the following scenario. You are a tax professional meeting with a new client to
help them make strategic tax planning decisions. You know that clients can choose from
a variety of tax positions that vary in aggressiveness. A very conservative tax position
will result in paying higher taxes but reduce the likelihood of an IRS audit and fine. In
contrast, an aggressive tax position will result in lower taxes but increase the likelihood
of an IRS audit and fine. As this is a new client, you want to assess their appetite for tak-
ing aggressive versus conservative tax positions. An intern prepared a list of questions
to ask the client. Review the list of questions below.
REQUIRED
For each question, decide whether it is a SMART question or not. If not a SMART
question, then rewrite the question and explain why you changed the question. Consider
each question independent of the others, meaning redundancy between questions is
okay, as you would not likely ask all of these questions. This exercise helps you practice
developing SMART questions.
1. You want to take an aggressive tax position, right?
2. Why do you pay taxes?
3. What do you think Congress should do to reform personal income taxes in this country?
4. How much money do you want to save on taxes?
5.4 Consider the following scenario. You are a new staff internal auditor for a national res-
taurant chain. Your manager assigns you to visit a new restaurant location that is per-
forming poorly. In preparing for your visit, you search customer review websites and
find that many customers are complaining about the cleanliness of the restaurant. When
you alert your manager to this, she is concerned about potential health code violations.
She asks you to prepare a list of questions that you will ask the employees related to the
cleanliness of the restaurant. Your manager wants to assess the risk of health code viola-
tion and understand why it is offending customers.
REQUIRED
Prepare a series of questions to ask the employees. Remember the SMART principles
as you design your questions. For each question, list to whom you plan to ask the ques-
tion, and discuss how the question applies one or more of the SMART principles. Each
question does not need to apply all of the SMART principles, but your combination of
questions should address all of the SMART principles.
5.5 For each of the following situations, indicate whether the analysis is an example of a
descriptive analytic, diagnostic analytic, predictive analytic, or prescriptive analytic.
1. An accounting firm is trying to understand if its external audit fees are appropriate.
They compute a regression using public data from all companies in their industry to
understand the factors associated with higher audit.
2. A self-driving car company uses artificial intelligence to help clean its historic social
media data so they can analyze trends.
3. An airline downloads weather data for the past 10 years to help build a model that
will estimate future fuel usage for flights.
4. A shipyard company runs a computer simulation of how a tsunami would damage
its shipyards, computing damages in terms of destruction and lost production time.
5. An online retail company tracks past customer purchases. Based on the amount cus-
tomers previously spent, the program automatically computes purchase discounts for
current customer purchases to build loyalty.
CHAPTER 5 INTRODUCTION TO DATA ANALYTICS IN ACCOUNTING 181
REQUIRED
For each of the other three data extracts, do the following:
• Describe if delimiters are present, and if so, what they are.
• Describe if text qualifiers are present, and if so, what they are.
• Describe the steps you would take to prepare each extract for importing into the
company’s database.
Correct Format for Import
1. Extract 1
2. Extract 2
3. Extract 3
units. They also want to know if they are producing more units than the minimum run
requirement for each production run. The minimum run requirement is the number of
units that must be produced to be profitable for each type of inventory item. Go to
the student download page at http://www.pearsonglobaleditions.com and download the
files labeled “P5-7tbl_Batch.csv”, “P5-7tbl_Products.csv”, “P5-7tbl_ProductsBatch
.txt”, and “P5-7.xlsx”. The first three files are exported from systems that produce dif-
ferent output formats.
REQUIRED
Import each of the csv and txt files into the Excel file. Import each file into the
sheet with the same name as the file. Pay attention to the delimiters in each file
and whether the file uses text qualifiers or not. Combine the data from the three
sheets to the tab labeled “P5-7tbl_AllData”. Once you have all the data gathered
correctly into the “P5-7tbl_AllData” tab, answer the following questions, listing
each answer on the appropriate sheet (e.g., Solution1 sheet, Solution2 sheet, and
Solution3 sheet):
1. How many units did each location produce?
2. How many units did each manager produce?
3. How many product batch runs produced fewer units than the minimum run size?
5.8 Excel Project: Well-Designed Visualizations
The following visualization (viz) shows all state taxes and fees collected for the state of
New York from 2014 to 2018, inclusive.
REQUIRED
For this viz, do the following:
1. Describe ways you would change the viz to (a) simplify the data presentation and (b)
properly emphasize the objective of the stakeholder. Assume the objective of the stake-
holder of this viz is to compare how each type of tax collections changes over time.
Total
6.E110
5.E110
4.E110
3.E110
2.E110
1.E110
0.E100
Corporation and Business
Personal Income
Personal Income
Personal Income
Personal Income
Personal Income
Property Transfer
Property Transfer
Property Transfer
Property Transfer
2014 2014 2014 2014 2014 2015 2015 2015 2015 2015 2016 2016 2016 2016 2016 2017 2017 2017 2017 2017 2018 2018 2018 2018 2018
The following case was developed by the Ernst & Young Academic Resource Center, which is
sponsored by the Ernst & Young Foundation. Robotic process automation is growing in impor-
tance in the accounting profession. Several RPA vendors provide free trial software, includ-
ing Automation Anywhere (https://www.automationanywhere.com/lp/community-edition) and
UIPath (https://www.uipath.com/freetrial-or-community). Download one of these software
packages and complete the following case.*
OVERVIEW process of creating bots is the same for simple and more
Wood’s Amazing Woods Inc. (Wood’s), a small com- sophisticated software. In addition, companies typi-
pany in Idaho, sells wood products. Founder Jason cally have solutions to provide automated billings. This
Woodworth has focused most of his attention on case uses this simplified scenario to teach the introduc-
providing excellent service and products but not on tory principles of building bots and does not necessarily
keeping his accounting system updated. Wood’s cur- demonstrate a typical process that would be automated
rently has a manual billing process to bill customers in practice. Although the case simplifies the software
for the wood products sold. That work is performed by and the process, the basic skills of building an RPA are
a billing specialist, Emily Young, a recent accounting applicable to all automation settings.
graduate. Jason would like to completely automate his
currently manual billing process to drive productivity BEFORE YOU BEGIN
and cost savings. Jason also wants to provide Emily ● Save all of the provided Excel files for the case (go
with the opportunity to do some data analytics on their to the student download page at http://www.pearson
financial information and reduce her overtime during globaleditions.com and download the files) into the
the end of the month for better work–life balance. same folder on your hard drive. You should build
You will build a bot for each case part starting with the bot so that the first sequence allows the user to
a simple billing scenario in part I. The bot you will build input the file path for the folder where the files are
in each subsequent case builds on the first bot and is located. Choose a file path that does not have too
more complex than the previous bot as more complexity long of a name. Store this file path as a variable and
will be added to the billing process. Recall that the basic then use this variable as a reference to load/save/
function of a bot is to recreate the steps a human would move/etc. any files. This way, the bot can be moved
do in a process. to other computers and still function (i.e., be graded
This case uses a simplified automation process that by your professor).
is modified from what would happen in an actual ac- ● Innovation_mindset_case_studies_RPA_
ware than the Excel files used in this case; however, the Billing_InvoiceData1.xlsx
● Innovation_mindset_case_studies_RPA_ REQUIRED
Billing_InvoiceData2.xlsx ● Your bot is required to create an invoice for each
● Innovation_mindset_case_studies_RPA_ customer who should be billed. The bot should be
Billing_InvoiceData3.xlsx programmed so it can complete the task for any
● Organize your thinking in a flowchart before you number of records included in the invoice data.
begin programming your bot. This makes it more In other words, you should not hard code all the
likely that you will not forget key parts of the values—instead, you should use programming to
process. It also allows you the ability to develop loop through all of the rows and create invoices
your bot in segments or sections, which can make accordingly.
troubleshooting much easier.
PART III
OBJECTIVE Wood’s Amazing Woods Inc. currently uses the follow-
Your task is to build a bot that automates the billing ing manual process to bill its customers.
process. Ultimately, a billing specialist should be able
to open the RPA software and run the bot, which should ● The billing specialist, Emily Young, opens the
perform all the steps described in the manual process Excel file with the invoice data, Innovation_
(following) without any human interaction, with one mindset_case_studies_RPA_Billing_InvoiceData3.
exception. When the bot is run, it should require the xlsx. There are multiple records in this data.
user to enter a file folder location where the files are ● She then reviews the data and opens the Excel
currently and will be saved when done. This allows the invoice template, Innovation_mindset_case_
bot to be transferable to other computers and still run studies_RPA_Billing_MasterInvoice.xlsx.
successfully. ● Emily creates a unique invoice for each record in
the invoice data if the customer has not already
PART I been billed.
● Customers who have been billed have dates in the
Wood’s Amazing Woods Inc. currently uses the follow-
ing manual process to bill its customers: column labeled “Billed.” If the field is empty,
then the customer needs to be billed.
● The billing specialist, Emily Young, opens the ● If a customer has already been billed, Emily
Excel file with the invoice data, Innovation_ skips this customer and goes on to the next
mindset_case_studies_RPA_Billing_ customer.
InvoiceData1.xlsx. ● She then saves the updated invoice template file for
● She reviews the data and opens the Excel each customer as “Innovation_mindset_case_studies_
invoice template, Innovation_mindset_case_ RPA_Billing_MasterInvoice_InvoiceNumber.xlsx”
studies_RPA_Billing_MasterInvoice.xlsx. (where InvoiceNumber is replaced by the invoice
● Emily creates a unique invoice for each record in number being billed).
the invoice data. ● Once she is finished creating the invoices for each
● She saves the updated invoice template file for each customer, she opens the invoice data Excel file,
customer as “Innovation_mindset_case_studies_ Innovation_mindset_case_studies_RPA_Billing_
RPA_Billing_MasterInvoice_InvoiceNumber.xlsx” InvoiceData3.xlsx. and updates the “Billed” col-
(where InvoiceNumber is replaced by the invoice umn with the current billing date.
number being billed). ● Finally, she saves the file as Innovation_mind-
In the invoice data file, there is only a single record. set_case_studies_RPA_Billing_InvoiceData3_
Complete.xlsx.
REQUIRED
● Your bot is required to create an invoice for the one REQUIRED
invoice. ● Your bot is required to create an invoice for each