Now You See It: Understanding and Using the Power of Exploratory Visualizations to Gain Insight into Your Audit Data

July 27, 2023
Brian Lewis

Modern ERP systems can contain millions if not billions of transactions and balances. Auditing that volume of data efficiently and effectively using traditional sampling and tabular methods is difficult if not impossible. Moving beyond the traditional charts and graphs have long been a tool to allow auditors to illustrate audit findings and results, exploratory data visualization techniques can allow you to gain insights and identify audit issues that spreadsheets and sampling cannot. In this webinar, you will learn three exploratory data visualizations that you can immediately incorporate into your audits.

Learning Objectives: After completion of this program you will be able to:

- Understand which datasets are best audited using exploratory visualizations
- Learn how to create three exploratory data visualizations (linear regressions, variances, and pattern identifications) to immediately begin using in your audits
- Understand how to link data visualization indicators with actual transactional or balance related findings
- Learn how to make data visualizations an understandable part of your audit report

Q&A from attendee-submitted questions and comments:

I think the additional important requirement, however, is that the auditor be able to explain how they retrieved the data. If the auditor pulled the wrong data or misinterpret the definition of data, the exploratory audit might create more chaos to prove that audit is wrong than the potential value of finding a possible risk - don't you think?

That is correct. The validity of the data being pulled is essential. A valid method such as regression applied to incorrect, incomplete, and misinterpreted data will, of course, result in more risk, not less. As is the case in any audit testing, a pre-step is ensuring that the data pulled must be tied out for completeness- ideally to trial balance account balances or subdata (e.g. purchases hitting accounts payable, or in the case of the inventory purchases related to COGS, purchases broken down into the components of inventory and non-inventory purchase). Data retrieval is always one of the biggest challenges in audit, and even more so ensuring the data is clean and correct. As to ensuring correct interpretation of the data (e.g. the data relationship between the chosen x’s and y’s) this is a key part of the developing the hypothesis- just as in the scientific method, the hypothesis should be based on educated/informed conjectures as to what a relationship should be based on an understanding of the business. As in our example, if you are selling inventory, it will need to be replenished. As with any statistical tool, care should be exercised to not take the result as an audit finding, but instead as an indicator that a key business process relationship has changed from what was expected. From there, the auditor should ask questions.

What about over/under fitting?

Overfitting or underfitting will result in bad conclusions. This is why it is very important to not only generate the visual, but to also check the statistics generated by the regression tool. The first (see slide 23) is Multiple R. Multiple R, if it is close to a value of 1 (say a tolerance of no less than .85) indicates that there is a valid relationship between the X and Y variable- in our example that inventory purchases (Y) are a dependent relationship to costs of goods sold (COGS). This is based on the assumption that inventory purchases will ultimately be related to what an organization is selling. That said, the auditor can’t stop there. Just because you can force a fit, the data may not fit well. The measure of whether the data fits the relationship well is R Square. As in Multiple R, R Square should be near 1 (say a tolerance of no less than .85). If you have a Multiple R and an R Square close to one (within a reasonable tolerance), then the regression is a good fit without unreasonable over/underfitting.

Are there example/use cases for service industries such as financial services?

Not that have been specifically developed for this presentation, but the same steps would apply.

1) Develop a hypothesis; Gather data for both prior and current periods; Test the hypothesis; Conclude on the hypothesis- is there actually a relationship between the values; Test the current period and analyze the results; Follow up on significant differences; and Present the results.

2) A good starting point might be identifying revenue sources (e.g. account fees might be a possible Y) and then making educated guesses as to what the driver(s) of those fees are (average account balances, overdraft, etc. might be possible Xs). Then gather the data and test the hypothesis(ses).

Excel is limited to a little over one million rows, but transactional data can frequently exceed that. What are some good workarounds for this limitation?

There are more complex software tools that can allow regression analysis on very large data sets (> 1 million rows), but there are a few workarounds for using large data within Excel. The best is Excel Power Query. Power Query is the ETL (extract, transform, and load) engine included with Excel for the purposes of allowing a user to work with an unlimited number of rows. A user can pull the large data set in Power Query and then use “grouping” to summarize the data into a manageable data set that can work within the Excel row limit. For example, if you are working with 10 million rows, you can group that data (transactions) into days, which will give you 365 rows. Just remember that if you group the data for your X values (Cost of Sales in our presentation example), that you also must group the Y values in the same manner (into days).

When communicating with auditees, how do you ask for explanations to linear regression variances?

It is generally better to avoid using statistical analysis terms when following up with an auditee. Instead, ask the questions in business user friendly language, going from general to more specific. For example, if based on the linear regressions-based variance analysis, inventory purchases appear unusual for a month, quarter, etc., it is better to ask “When looking over activity in Q1, I noticed that inventory purchases were higher (or lower) than compared to prior period(s). Were there any significant changes in the business or supply chain that this is related to?” Then based on the response, ask more specific questions.

Cookie	Duration	Description
_GRECAPTCHA	5 months 27 days	This cookie is set by the Google recaptcha service to identify bots to protect the website against malicious spam attacks.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
elementor	never	This cookie is used by the website's WordPress theme. It allows the website owner to implement or change the website's content in real-time.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Now You See It: Understanding and Using the Power of Exploratory Visualizations to Gain Insight into Your Audit Data

Four Components of Successful Divestitures

Now You See It- Gaining Audit Superpowers with BI Exploratory Data Visualizations

Five Common Audit Areas that Benefit from Data Visualizations

Streamline Your Audit with Secondary Ledgers