Understanding Data Cubes
Learn reporting terms with a worked example
Cubes
Have you ever wondered why reporting people go on and on about cubes?
To understand, first, imagine a Rubik’s cube, this one is a bit simple, all one colour.
It’s a cube shape with 3x3x3 segments
OK?
Measures
Ok, now imagine this cube represents something you want to report on, let’s say it’s volume of sales
We call this a measure, i.e. things you have measured, on which you wish to report
These things can either be counts of things like Volume of Sales, or can be calculated like a ratio for example — ratios tend to be more powerful than raw statistics
Sometimes these are referred to as metrics, facts, datapoints, KPIs
More about measures
Imagine that each segment of the cube contains a number, the number of sales
I’ve put some numbers on this example cube, they add up to the total
Note that each segment just has a single value, not a value based on the number of faces it presents (i.e. 3 faces on a corner, 2 on an edge, 1 for the centre), it’s just one sales number per whole segment.
Slicing Dimensions
Now imagine the slices of the cube represent different things — let’s say it’s region, we have 3 regions, North, Central, South
This so called slicing allows us to filter by the slice, letting us concentrate on which region is relevant to what we want to know about our business, so if we produce a report for just Central, then that’s known as a Slice of the dataset
Slicing is just another word for filtering
We can see that South gets the fewest Sales By Region
Dicing Dimensions
Now imagine that each of the columns represents a different product type, let’s say we have 3 products — TVs, Radios, Laptops
Chopping the data like this is known as Dicing
Dicing is just another word for sub-totalling
Combining this with Region, we can now additionally report sales by product and make that regional
Looks like South’s Laptop and Central’s Radio sales are areas to target for improvement
More on Dicing
And finally, imagine the depth of the cube is the sales stored for each month, that’s us now down to the individual value for each cube, dicing the data further
We can see here why things are called dimensions Product, Region and Month are effectively like x,y,z co-ordinates for our data
For example 43 laptop sales in central in Feb
In this example we started with a monthly summary, so the grain of the data is monthly. Granularity means the smallest chunk available for reporting.
Worked Example
You might have used Excel Pivot Tables in the past? Those are cubes, Data Warehouses, Data Marts — they all just do the same trick at scale
The coloured excel filters are called “slicers”, and now you know why
Next Steps
For sanity, there are just 3 dimensions in the example, but you’re not limited to just 3 things, you can filter on as many as you want within reason as long as you can capture the data at that level
Knowing how data is structured for reporting can help you make the most of it
Take time to fully understand the worked example, then imagine how you could add to it — sales person, colour, make, model? The more dimensions you add, the greater the opportunity to identify patterns and trends, remember that ratios are almost always more meaningful than simple counts.
This doesn’t explain everything you can possibly know — hierarchies are the next thing you should investigate — Date is the most common hierarchy, imagine if we could choose daily, weekly, monthly, quarterly, annually — that’s a dimensional hierarchy and data is summed for all those levels ready to be plucked out