In the not-too-distant past, doing anything meaningful with AI/ML or data science required highly skilled statisticians and programmers — so-called “purple unicorns” who also had the business functional knowledge to develop and tune sophisticated mathematical models. Now, however, many off-the-shelf tools used for daily tasks such as forecasting, data entry, reporting and FP&A include advanced AI/ML capabilities that anyone can use with very little technical ability.
AI/ML Runs on Data
As data science and AI/ML tools become simpler to use, so must the data. To fully empower knowledge workers to change the way they work, there are two data quality imperatives:
1. Have good data
2. Make it available
With all of the advances in enterprise data technologies, you’d think we would have solved the issues around data quality and availability by now. But we all know that is not the case. We still face the same problems — only now there are a lot more data and systems adding to the complexity. AI/ML can only amplify these problems unless they’re addressed. So, what does good data mean?
Data quality encompasses several key components:
Data that does not support critical business KPIs is of little value. This idea ties directly to the overall business strategy and is the key pillar driving the enterprise data strategy. A common approach is beginning with strategic objectives and breaking them down to an agreed-upon set of metrics to measure the achievement of those strategic goals. These KPIs are then further decomposed into data sets required for their operational support and daily monitoring. This helps ensure that data used for AI/ML purposes will be the most relevant to the business.
Once relevant data is identified, the next task is to ensure it is complete. The concept is just as it sounds – data is either present or absent but is supposed to be present. This typically happens at the point of data entry where data capture is incomplete, and fields are left blank. Ideally, this should be fixed at the source, but that is not always feasible, especially when dealing with data from legacy or other systems integrations.
The “go-forward” solution is to actively govern new data via audits to catch these errors and fix them at the source soon after they occur. For legacy values, several data science statistical models can make reasonable predictions on the values of the missing data element. Many AI/ML tools have built-in capabilities for this purpose.
Consistency and Accuracy
If the data is bad, then AI/ML will only amplify its shortcomings. Worse, users will not trust it and give up, thus branding AI/ML as a failure. It only takes one bad experience for this to happen. Solving this is tricky but possible.
It starts with agreeing on the definition of a data element. Take headcount, for example. It sounds very simple but can be computed in various ways depending on how you define it (headcount vs. FTE). It also can get complicated when you consider the time elements (by month, pay period, fiscal period, etc.). The same data element must be consistent across all reports to be trustworthy by users.
A common data dictionary with pre-defined metrics will help in solving this problem. It’s widely believed that data scientists spend 80% of their time sourcing and preparing data for modeling and reporting. The less of this a user has to do, the better the output will be.
This brings us back to the final piece of the puzzle: making the data readily available to non-technical business users. Although familiar office and ERP tools have new built-in AI/ML capabilities and have reduced reliance on IT and highly skilled data scientists, challenges remain. Even though an enterprise data warehouse may exist, end-users still need the ability to create meaningful data products from it.
One popular option is commonly called a “data mesh.” This user-centric concept puts all relevant data in the hands of users via self-service data catalogs and business-oriented data layers that hide the underlying complexities and expose the properly defined data elements, thus removing the need for data wrangling and manipulation.
Ease of Use, Thy Name Is Data Governance
If this sounds a lot like data governance, you’re right. A data governance program encompasses all of these concepts and puts in place policies, controls and organizational data owners (stewards) who are responsible for the enterprise’s data assets. The scope of these efforts might include the entire enterprise, so it may not be feasible for a single department, such as Finance or Supply Chain, to undertake on its own.
All is not lost, however. Simple-to-use data marts with data quality tools and catalogs can be implemented on a smaller scale and can increase the quality and success of the departmental AI/ML program.
Careful planning and being open to experimentation – and incorrect results – is part of building a high-quality AI/ML program at the department level. Take the time to understand the relationship between the incoming data and the AI/ML results and manage expectations. Not every AI/ML effort will succeed. This is why focusing on the underlying data will produce better outcomes and more successful AI/ML programs.
Putting It All Together
Our series of articles has covered some of the foundational aspects of embarking on the AI/ML journey for the office of the CFO. I’ve given you some ideas on how to define AI/ML, how to use it, and how to prepare your data. Next, we’ll tie it all together to help you get the most out of AI/ML for your finance and accounting team.
Subscribe to our Now of Work newsletter to catch the next article, and contact us if you need help with your data quality and governance to make the most of AI/ML-enabled finance transformation.