For making complex data-driven decisions, you need a complex data visualization tool. Happily, OLAP (short for On-Line Analytical Processing) is fast becoming a ubiquitous method for the advanced analysis of multi-dimensional information. In Argos, our OLAP engine is fast and incredibly intuitive; the most difficult part is the interpretation of the information it provides. This post will introduce you to our OLAP environment and explain some of the best practices we employ when developing OLAP cubes for our clients. First, some important terminology.
OLAP Terms Defined
- On-Line Analytical Processing (OLAP): The technological infrastructure used to provide ad-hoc user query capability and multi-dimensional analysis.
- Measure (or Fact): The numerical values being analyzed: headcount, revenue, GPA, SAT scores, etc.
- Dimension: The “buckets” into which the measures are sorted, for analysis; term code, year, month, quarter, bio-demographics, geographic region, etc.
- Level of Granularity: The deepest level of detail that’s included in the OLAP’s dataset.
NOTE: For an example, let’s say you have an OLAP cube with ‘student’ as the level of granularity. That would mean that you would be able to analyze information for individual students, such as GPA, headcount, and test scores, but you would NOT see student coursework because a single student can register for many courses. To analyze coursework information, you would need to set the OLAP’s level of granularity all the way down to the course level.
What is OLAP, really?
OLAP is an answer engine whose goal is to put the power of finding answers into the hands of non-technical individuals. It does this by grouping measures into logical groups and allowing the user to slice the data into different views through the use of dimensions. It creates a mechanism for easy longitudinal reporting (analysis over time) as well as trend analysis. The questions OLAP focuses on typically start with some like “How many,” “How much,” or “What is the average.”
Design Process & Best Practices
The first step when designing an OLAP cube is to identify the level of granularity, the cube’s measures, and the cube’s dimensions. This is important because trying to incorporate dimensions or measures in a cube that are not consistent with the granularity will cause inaccurate calculations and thus incorrect answers. For example, let’s assume you were hoping to count students by term but you set your level of granularity at the course level. For each student taking more than one course in a given term, they would be counted more than once! Obviously, this leads to confusion and inaccuracy, and is something that must be carefully watched during OLAP development.
Next, we need to determine what measures and dimensions will be needed. The best way to do this is to ask your target user groups the following questions:
- What are the three most frequent questions you are asked that this OLAP cube should answer?
- What, if any, are the federal or state reporting requirements related to this data?
- What are the three most important questions about this data that you’ve always wished you could answer but never have?
The answers to these questions typically look something like this:
- What is the unduplicated headcount by term, gender, & major?
- What was last year’s retention rate?
- How many Psychology 101 sections should I offer next fall?
So, using the answers above, you can create a measure and dimension breakdown something like this:
- Measures: FTE; Dimensions: Year, Term, Gender, Age, Department, Major, & Area Code
- Measures: Retention Rate; Dimension: Year
- Measures: Headcount, FTE, & Credits; Dimensions: Year, Term, Subject, & Course Number
By picking apart the questions that your audience needs to answer, you can inform the design of the dataset used to populate the OLAP you’re designing. This is, by far, the hardest part of designing an OLAP cube and note: we have not yet written a single line of code.
Alas, it couldn’t last forever. Now it’s time to write the query required to pull the measures and dimensions as identified above. With OLAP this is usually quite simple, as it does not typically require any aggregation or grouping (the OLAP engine handles that for you). The example query below would get the measures and dimensions required for the first question above.
NOTE: This query is written for a standard Banner installation that includes the student module.
As you can see, this query does zero aggregation. It pulls all registered students that exist in your Banner database and feeds that into the cube. The cube uses that raw data to build its multi-dimensional view, which then enables the user to pose questions and analyze the dataset.
While this is just one example of the kind of questions an OLAP cube can help you answer, you can see how this feature serves as a powerful tool for data analysis. If you’ve struggled to get clear answers to complex questions, it’s definitely worth taking the time to learn how to harness the power of OLAP.
Do you already use OLAP cubes for data analysis at your institution? We’d love to hear about the kinds of questions they help you answer—please share in the comments below!
Latest posts by Zach Heath (see all)
- Best Practices for Designing OLAP Cubes - 05/01/2014
- Deconstructing Business Intelligence & Analytics for Higher Ed - 12/04/2013
- Configuration Tips for Improving your Data Security - 08/15/2013