Overcoming Data Integration Issues

April 18, 2017

Comments

Data. It’s a simple word. Yet collecting it, analyzing it, and understanding it isn’t always so simple. For institutions of higher education, this is usually the result of two circumstances: 1) Data from one source does not always match data from an approved source, 2) Universities have multiple systems that perform specific, yet critical, functions.

Source of Truth vs. System of Record

System of Record is where the data is stored. Source of Truth is a validated, trusted data source. These can be – and often are – different to different people. This is where data from one source does not always match data from an approved source.

The ERP is generally referred to as the source of truth. While recruiting data, for example, might end up in the ERP, the recruiting application is probably the source of truth for the recruiters and admissions department. Which one is right? It really depends on the question being asked and how the data model is configured.

System of Record

The individual systems that supply data to the ERP are the systems of record for auditing purposes (but can serve as the source of truth for some reporting). The system of record may migrate to the ERP if the individual systems do not maintain historical records, while the source of truth may migrate from the ERP if a data warehouse or reporting system is in use by the institution.

Source of Truth

What is the source of truth? It is the data source that holds the correct data that can be used with confidence by end users. It provides the basis for reconciliation, a starting point for analysis, and validates the integrity of the data. Should there be a single data source? No. Should there be a single source of truth? Yes, but it might vary depending on the data representation.

A Changing Environment

Many institutions are moving beyond the web of spreadsheets and into a data-centric environment.  Executives expect a quick turnaround on reports and visualizations. Relying on IT is difficult because of limited resources and/or lengthy turnaround time. The end user is relying on their tools to keep the single source of truth intact. In this case, there are multiple sources of truth, probably one per department or even per end user. This is not a workable solution in the long term.

Because of the possible complexity around the source of truth and lack of understanding of the data model, differences of opinion can arise on the validity of the numbers presented. As a result, department VPs can enter a meeting with numbers that do not agree and have no idea why. Thus, it is important that data be defined and the validation method understood, at least at a high level, so that meaningful decisions can be made using the data.

A Move Toward Visualization

Several methods are used to define the source of truth for data visualization. Using the ERP for visualizations is an option, but may not be the best one. The ERP is structured for transactions and not for reporting. The visualization effort may impact the performance of the ERP, slowing down work at the institution. Other options involve migrating the data from the ERP to a system that is designed for reporting and visualization. This raises the question about the source of truth: is it still the ERP or is it the reporting system? The source of truth could move to the reporting system, making the ERP the system of record. (Obviously, the reporting system needs to be validated so that the end users have confidence in the data.)

Multiple Systems

Usually, specific systems – separate from the ERP – are the leading edge of the data (while the ERP is updated at a later time). Data will look different between the systems, but that is ok if the discrepancy is noted and understood. This is especially true for visualizations that present aggregate data where the details might not be known to all viewers of that data.

It’s common for institutions to have multiple systems that perform critical functions. (One system for registration, one for admissions, one for recruiting, one for advancement, one for graduate tracking, etc.) These systems all need to be connected in some way. Typically, it’s done through complex queries and expensive data integrations. Even worse, the end users must learn multiple systems (multiple log-ins, multiple ways to do the same thing, etc.)

Migrating Data and Data Validation

When migrating data from one system to another, such as from a transactional ERP to a reporting system, data validation becomes very important. This process is often time consuming to a reporting department and must be repeated multiple times, as the nature of the reports change and the underlying data may change as well. The data validation process is more complex if the report being validated is an aggregate of the data source. The underlying data details are not always visible and this makes validation difficult.

One option is to recreate the queries in a secondary tool before moving the data into the primary tool.  Another option includes using files to transfer data from one application to another application. While this method works, it is fraught with potential errors (e.g., users using the wrong version of a file in their data visualization). Both of these methods require time-consuming data validation.

A Possible Solution

A potential solution to this issue is to use validated data as the source of truth and to supply that data to visualization and analytics tools. This can be accomplished by using a publicly free and available open standard format.  Communicating using an open standard format simplifies the connection between applications, allowing easier integration for data visualization. An open standard format is a good choice because it does not tie the application to a specific vendor, thus freeing the institution from locking themselves into a single vendor in order to access and then display the source of truth.

Peter Wilbur
+ posts

Peter Wilbur is a Strategic Solutions Manager on the Client Experience Team at Evisions. Peter graduated from Northern Arizona University with a computer science degree in 1984. After working in several industries and with numerous companies, he joined Evisions in 2010 working on the support desk before moving to Professional Services, where he eventually came to serve as Professional Services Manager. Peter is a member of the Project Management Institute and a PMP. He enjoys spending time with his German Shepherds.

Related Posts

1 Comments

1 Comment

  1. Trent Armstrong

    Thanks, Peter! Excellent article. Helps to clarify the meaning and usages of the terms “System of Record” and “Source of Truth”. The System of Record should be the definitive source of input data, while the Source of Truth should be the cleansed and validated source of output data.

    Reply

Submit a Comment

Your email address will not be published. Required fields are marked *