SQL Chick

View Original

Why the Default Summarization Property in Power BI is So Important

This post describes an issue I ran into this week on a project. If the Default Summarization settings in your data model are not correct, it results in wrong data being displayed, or even missing data, on your report.

This post is based on Power BI Desktop as of mid November 2017. The concept also is applicable to Tabular models in Azure Analysis Services and SQL Server Analysis Services. 

Let's say I have a small table of data:

By default, Power BI detects numeric columns and sets the summarization property to sum (or count). We can tell this by the sigma symbol next to YearNbr and SalespersonDept, and the Default Summarization property for the column:

What you really want to do is set both SalespersonDept and YearNbr to "Don't summarize" as shown here:

Let me explain why setting this summarization property is so very important.  

Summarization Behavior on the Report Canvas

Since Bob has 4 rows of data, when we put this data on the canvas, the SalespersonDept and YearNbr sum themselves up. If you don't know the data well enough you might not even notice these are bogus number (especially SalespersonDept in our example). So, not only is the data being displayed incorrectly, you can also waste some time trying to test or cross-reference SalespersonDept 4004 because it doesn't exist.

Now, you might say to me that it's an annoyance and you can fix it in the visual. Of course you can, but do you want to have to fix it on *every* visual you create? What if you don't catch the fact that SalespersonDept is a bogus number? So if that doesn't convince you to set the summarization property on the dataset, I'll bet this next reason will...

Summarization Behavior in Visual Filters

Let's say I add a visual filter on my table because I only want to show Bob's rows of data. Bob is SalespersonNbr 1001. However, the visual filter does not return any rows of data for 1001 -- the only way it finds data is if we search on the bogus *aggregated* value for SalespersonNbr 4004 (which is 1001 * 4 rows of data):

Not a good user experience, right? So the message here is that the visual level filters are applied *after* the data aggregation occurs within the individual table or chart. Note that slicers, page filters, and report filters don't behave this same way--they each return the correct data based on SalespersonDept 1001 because they filter before summarization. It's just visual level filters that are applied after the data aggregation occurs in the visual (thanks to Javier Guillen for confirming I wasn't crazy!).

Options for Resolving Aggregation Issues

First choice: Solve this in the underlying data storage. For instance, if you have a data warehouse SalespersonDept should be a varchar(4) instead of an integer. With this choice, no downstream reporting tool whatsoever will try to aggregate columns like SalespersonDept or YearNbr because the column will be detected as text by Power BI. Of course, we live in the real world and this can't be done unless we're creating a brand new source database. Which brings us to...

Second choice (shown in this post): Set the summarization setting in the data model correctly for every column in every table. Another way is to set the data type to text. The dataset serves as our semantic layer, so by setting this in the data model then each report visual can take advantage of this setting. This is the most common way of handling the issue.

Third choice (worst choice): Your last choice is to change the aggregation in every individual visual. This is risky because what if you forget? Or what if someone new creates a new report and doesn't know? So, don't do this please. The previous choice of handling in the data model is a much more sound practice.

You Might Also Like...

Reusing Datasets Imported to the Power BI Service