Your institution operates on data. Lots and lots of data. In order to run your university or college effectively, you need to make sense of that data. This means that all of that information must be stored, organized, and made available for reporting.
Often times that is accomplished through the use of a data warehouse – a central repository of integrated data from one or more disparate sources. In higher education, this raw data typically comes from the school’s Enterprise Resource Planning (ERP) system (e.g., Banner®, Colleague®, or PowerCampus®) and is usually considered the ‘source of truth.’
Function and Benefits of a Data Warehouse
The main function of the warehouse is to maintain a copy of the information from the ERP (aka, source transaction) system, housing current and historical data from multiple sources across the enterprise. By doing this, it can maintain data history (even if the source transaction systems don’t), enable a central view across the enterprise, improve data quality, present the institution’s information consistently, and add value to operational business applications.
It can also be used for creating analytical reports throughout the enterprise. While the data warehouse allows for complex analytical queries without impacting operational systems, it may not always be the best source for reporting. That would be a data mart.
The Data Mart
A data mart is a focused, fully automated Extract, Transform and Load (ETL) and reporting solution, built to answer specific, frequently asked questions or to address particular needs. A data mart is like the data warehouse’s little brother. It can also pull in data from disparate sources, but it does not store nearly the amount of data that the warehouse does.
When built, the data mart can either tie into the data warehouse or directly into the transactional system. In either case, the fields within a data mart should be carefully chosen to ensure they pertain only to the specific information being sought.
Because a data mart is so focused, it lends itself to more efficient reporting. Reports run off of a data mart can be completed much more quickly than any reports run off of a data warehouse or off of the transactional system itself. There just isn’t the volume of information through which the queries must search. In addition, the data itself is more dimensional, or reportable. Data used for reporting is different than the data used for transactions. And it is this data that can be grouped into tables for quicker analysis or into charts and graphs for better visualization.
I have already noted some key differences between a data warehouse and a data mart. There are still other differences that must be acknowledged and considered as you design your institution’s data storage and reporting systems. These differences evolve around the time, cost, and people it takes to build and manage the respective systems.
Because a data warehouse is much larger, with a higher level of complexity, it will take more time to build than a data mart. The integration of the information sources, along with the structure and relationships of the data, must be well thought out. (The same applies to a data mart, but its smaller size and less complicated structure make this a simpler process.)
As would also be expected, data warehouses are more expensive than data marts to build and maintain. Warehouses require more physical resources, such as servers, disk space, memory, and CPUs. While data marts may require some, or all, of these resources, they would do so on a much smaller scale. But it is also possible that a data mart can be built using the resources already installed as part of the data warehouse. This is clearly a more affordable option, but one that is dependent on the structure of the data warehouse and needs to be accounted for when the warehouse is first constructed.
You may have noticed that the list of physical resources in the previous paragraph omitted a key resource: people. You must take into account the number of people who will be required to manage the data, relationships, and processes. It goes without saying – but I’m going to say it anyway – it is much more difficult, and takes many more man-hours, to maintain a data warehouse than it does a data mart. Insufficient staff can lead directly to inefficient reporting, which defeats the whole purpose of having these systems in the first place.
If you build it…
While a school will probably have just one data warehouse, it is very reasonable to have multiple data marts in place throughout campus. When one is successfully built, maintained, and utilized, other reporting bodies will want to copy that success. If a college’s Institutional Research department is using a data mart to great effect, other campus offices may see that and request one to satisfy their own reporting needs. Or maybe the IT department recognizes the benefits and decides to devote the resources to implement similar solutions for other departments. Either way, it would behoove your institution to plan for the possibility of multiple data marts.
While they may be brothers, there should be no sibling rivalry between data warehouses and data marts. With their respective strengths, complexities, and nuances, along with the resources it takes to make them run efficiently, they can work together to provide the data institutions need to better serve their students.
I could not agree more with Peter. The transformative power of our data mart has made information accessible to research. They are able to create visualizations that shine lights on areas of success and areas needing enhancements. We would not have been able to take this huge step forward without the assistance of Evisions and Peter Wilbur.