Understanding Data Cubes

Learn reporting terms with a worked example

Colm Gallagher
4 min readJul 14, 2020
Photo by NeONBRAND on Unsplash

Cubes

Have you ever wondered why reporting people go on and on about cubes?

To understand, first, imagine a Rubik’s cube, this one is a bit simple, all one colour.

It’s a cube shape with 3x3x3 segments

OK?

Measures

Ok, now imagine this cube represents something you want to report on, let’s say it’s volume of sales

We call this a measure, i.e. things you have measured, on which you wish to report

These things can either be counts of things like Volume of Sales, or can be calculated like a ratio for example — ratios tend to be more powerful than raw statistics

Sometimes these are referred to as metrics, facts, datapoints, KPIs

More about measures

Imagine that each segment of the cube contains a number, the number of sales

I’ve put some numbers on this example cube, they add up to the total

Note that each segment just has a single value, not a value based on the number of faces it presents (i.e. 3 faces on a corner, 2 on an edge, 1 for the centre), it’s just one sales number per whole segment.

Slicing Dimensions

Now imagine the slices of the cube represent different things — let’s say it’s region, we have 3 regions, North, Central, South

This so called slicing allows us to filter by the slice, letting us concentrate on which region is relevant to what we want to know about our business, so if we produce a report for just Central, then that’s known as a Slice of the dataset

Slicing is just another word for filtering

We can see that South gets the fewest Sales By Region

Dicing Dimensions

Now imagine that each of the columns represents a different product type, let’s say we have 3 products — TVs, Radios, Laptops

Chopping the data like this is known as Dicing

Dicing is just another word for sub-totalling

Combining this with Region, we can now additionally report sales by product and make that regional

Looks like South’s Laptop and Central’s Radio sales are areas to target for improvement

More on Dicing

And finally, imagine the depth of the cube is the sales stored for each month, that’s us now down to the individual value for each cube, dicing the data further

We can see here why things are called dimensions Product, Region and Month are effectively like x,y,z co-ordinates for our data

For example 43 laptop sales in central in Feb

In this example we started with a monthly summary, so the grain of the data is monthly. Granularity means the smallest chunk available for reporting.

Worked Example

You might have used Excel Pivot Tables in the past? Those are cubes, Data Warehouses, Data Marts — they all just do the same trick at scale

The coloured excel filters are called “slicers”, and now you know why

Next Steps

For sanity, there are just 3 dimensions in the example, but you’re not limited to just 3 things, you can filter on as many as you want within reason as long as you can capture the data at that level

Knowing how data is structured for reporting can help you make the most of it

Take time to fully understand the worked example, then imagine how you could add to it — sales person, colour, make, model? The more dimensions you add, the greater the opportunity to identify patterns and trends, remember that ratios are almost always more meaningful than simple counts.

This doesn’t explain everything you can possibly know — hierarchies are the next thing you should investigate — Date is the most common hierarchy, imagine if we could choose daily, weekly, monthly, quarterly, annually — that’s a dimensional hierarchy and data is summed for all those levels ready to be plucked out

Things to remember

--

--

Colm Gallagher

Hey, I’m Colm, an anlayst from Scotland who’s interested in becoming more productive, so I’m using Medium as my way of writing down and refining my process ❤