More and more applications (software) are residing in the cloud. In the past, these applications were often viewed as black boxes. This means the end user sees an interface but has little insight into the processing that goes on within them. But that approach has been changing over the last several years. End users now want – or need – to have more control over their data. They must now be able to reach and pull data from the cloud.
Pulling Data From of the Cloud
At first, data stored in a remote application was provided to end users via a file, generally a CSV (comma separated value). The file could be easily imported into a data structure at the end user’s institution and used for additional reporting. This practice remains in frequent use today.
Another, more modern, method of retrieving data from a remote application is to use an interface provided by the vendor. This action tends to be more automated, as the interface is generally based on a RESTful API (meaning the API responds to calls from the end user).
The data surrounding a cloud application needs to be treated the same as the data surrounding any on-premise application. The data needs to be characterized, defined and the frequency of updates noted.
The characterization of the data going into a cloud application is important. This is the data it will use to execute its function. So, the format and type of data fields are important to know, both for the data entry process and as knowledge of what to expect for the output. The completeness of the data entry is important. A process ensuring that the data is complete and correct during input is vital to the success of the remote application.
The data coming out of the application needs to be characterized as well. There may be differences in date formats, number formats, string output formats, etc. Those differences need to be known so that appropriate measures can be taken to use the data correctly during reporting.
While the different characteristics of data going into and coming out of a cloud application are important to know, the actual definition of the data fields is important as well. If there’s confusion or disagreement about the data going into an application, then there almost certainly will be disagreement with any report-based conclusions. The data fields used for input to the cloud application must be defined.
An example such definition can be made regarding a budgeting application. What initial budget number or numbers should be loaded? There’s the original budget, the adjusted budget, and the adopted budget. If all parties understand which budget is being loaded, then the results can be clearly discussed and there is less chance of misinterpreting the data. Such definitions are the responsibility of the data owner and should be part of the data governance term definitions.
Frequency of Update
The data update frequency is a value that also needs to be known. This is how often the data changes or is updated. If a department is relying on cloud data for reporting, it needs to know when that data was or will be updated. If it knows the data is updated weekly, then that department can plan its reports accordingly – ensuring those reports contain the most up-to-date information.
In the Cloud, but Not Floating in Isolation
One last element to consider when working with cloud data is its linkage to other data used for reporting. Though it’s “in the cloud,” it isn’t isolated from the other applications (cloud or on-premise) that are also used to gather reporting data. That’s another reason why characterization, definition, and frequency of updates are important.
You may need to combine student data from two different applications. Not only do these applications need to have a uniquely identifying value, such as student ID, to tie them together, but any matching or related data must be up to date and based off the same data characterizations and definitions.
Using cloud-based data is the future for many institutions, if it isn’t already here. Even though the data source now resides in the cloud, its data should not be treated any differently than that of an on-premise source. That means your cloud data must still be characterized, defined, and the frequency of its updates noted. Doing so will lead to more accurate, consistent, and trusted reports.