Why You Should Create Reports in Power BI Desktop Instead of the Power BI Service

This post is to make you aware that there are some serious downsides when you choose to create reports in the Power BI Service rather than Power BI Desktop. If you agree with me on this one, please vote for this idea here on the Power BI Ideas site.

The Short Version of the Story

I always recommend to Power BI authors that report creation & editing should happen in Power BI Desktop and to just ignore the edit capability in the Power BI Service. Usually my reasons are concerned with:

  1. Which version is the latest version (if both Desktop and the web are being used by one or more people)?

  2. We’d like to have version history available (not available with the web but can be if you store the PBIX file on OneDrive for Business, SharePoint, or a source control repository).

  3. Reducing the risk of someone overwriting someone else’s work because two different report editing/publishing options are in use.

These are very valid concerns, but I’ve discovered one additional problem that could easily confuse some people: the side effects experienced if you download then re-upload a web-created report. Let me explain…

If you create a report directly in the Power BI Service *theoretically* this is like using the Power BI Dataset Live Connection, right? Because we are prompted to choose a dataset, this means we are reusing an existing dataset for our new report. Conceptually this is a great thing, but in reality the hub-and-spoke model (with the dataset being the hub and the various reports being the spokes) only works well if you generate all reports from Power BI Desktop.

Here’s the issue: when you download a web-created report (i.e., created in the Power BI Service), the dataset actually comes with it in the downloaded PBIX file. Unfortunately it does not download as the equivalent of a Power BI Dataset Live Connection. Then, if you turn around and re-upload the PBIX to the Power BI Service --- you end up with a duplicate report and (even worse) a duplicate dataset. I understand the product team wants everything to work seamlessly for busy users — however, I'd much prefer if it behaved seamlessly with a Power BI Dataset Live Connection for the downloaded file.

The moral of this story: Always create all content in Power BI Desktop if you possibly can. Don’t rely on the ability to download from the Power BI Service. I elaborate on this at the bottom of this post. But first, let me walk you through the process.

The Long Version of the Story

I have a dataset published to the Power BI Service which is called Sales Data Model:

PBI_SalesDataModel.jpg

I’m going to create a report directly in the web so that we can illustrate the issue I’m concerned about:

PBI_CreateReport.jpg

I’ll select the existing dataset for my report being created in the web:

PBI_ChooseADataset.jpg

Next I’ll create my really fancy report and save it in the Power BI Service:

PBI_SaveReport.jpg

Other than the name I chose to use, of course we cannot tell from the report list if it was created in the web or from Power BI Desktop. (And to be fair, if we stopped here we wouldn’t actually have any issues with it being used in the Power BI Service.) At this point, I see my report displayed in the report list:

PBI_Report.jpg

Now let’s download the report we just generated:

PBI_DownloadReport.jpg

After choosing a location to save the download locally, we get our first clue that something is wrong:

PBI_DownloadedReport.jpg

In the above image, see how it’s 1.2 MB? That’s pretty big for my fancy report that has one whole chart. This is our first hint that the dataset has just come along during the download process.

Let’s pretend that we made some edits in Power BI Desktop and that we want to publish them back to the Power BI Service. We can publish the pbix from Desktop, or upload from the Service:

PBI_UploadReport.jpg

Now that we’ve re-uploaded the edited pbix file, we see the issues. The first issue is we now have two reports named the same thing:

PBI_DuplicateReports.jpg

And the second issue, even worse, is we have duplicate datasets (this occurred since the dataset came along during the download):

PBI_DuplicateDatasets.jpg

Summary of Recommendations

Make it a standard practice to always create reports in Power BI Desktop rather than the Power BI Service. This has the following advantages:

  • You always know that the Power BI Desktop file saved to your standard location is the latest version because you trust your process.

  • You always have version history available (assuming that you’re using a standard location such as OneDrive for Business, SharePoint Online, or similar).

  • You have explicit control over what the dataset is doing: import, live connection, or DirectQuery. Use a Power BI Dataset Live Connection whenever you have an existing dataset already published to the Power BI Service that has the data you need (to reduce the # of times the data is stored, reduce work across different datasets, and reduce the # of refresh operations). By doing this from Power BI Desktop, we accomplish exactly what we want in that regard.

If you do want to create reports in the web (or someone with workspace permissions edited a report in the web):

  • Don’t plan to re-upload the downloaded PBIX file. I heard someone presenting a webinar recently recommended using the web for creating reports & that you always have the downloaded PBIX to fall back on. That makes me nervous because busy business users won’t all understand what is happening and could easily end up with duplicated reports & datasets if they go back and forth as described above.

  • If you do need to sync up a report in Power BI Desktop, copy and paste the edited visuals from the downloaded PBIX to the actual PBIX. Be sure to discard the downloaded pbix file since it contains the duplicated dataset. This will help you recoup the work that someone did in the web, but not run the risk of duplicating reports and datasets as described above.

Please Vote to Change the Functionality

If you agree with me on this one, please vote for this idea here on the Power BI Ideas site.

You Might Also Like…

Three Ways to Use Power BI Dataflows

Terminology Check - What are Data Flows?

Terminology Check - What is a Power BI App?

Three Ways to Use Power BI Dataflows

Dataflows in Power BI are a really interesting capability for centralizing and reusing Power Query logic among many different PBIX files. If you’ve found yourself copying M scripts (Power Query formula language) between files, or creating a library of M scripts then you should look into dataflows.

I won’t rehash the basic overview info here because it has been well covered by other bloggers. In particular, Matthew Roche has been writing a lot of good stuff:

This post is to focus on what I’m referring to as ‘the three ways to use Power BI dataflows.’ I talked about this on the March BlueGranite Office Hours session on Power BI Dataflows (video recording | slides).

Dataflows1.jpg

Dataflows Option 1: Fully Managed by Power BI

In this first option, Power BI handles everything. We use the web-based Power Query Online tool for structuring the data. Power BI handles scheduling the data refresh.

The underlying data behind the dataflow is stored in a data lake. However, since it’s fully managed, this data lake is not directly accessible or visible to the customer. As with most cloud-based implementations, the infrastructure is hidden under the covers. This is what is happening if your users are utilizing dataflows currently but you haven’t specified a data lake account in the Power BI admin center.

Dataflows_Option1.jpg

Option 1 above is useful when all of your data preparation is done by analysts and Power BI is the only tool which accesses that data.

Dataflows Option 2: Managed by Power BI with an Explicit Data Lake

In this second option, the diagram looks exactly the same as option 1. The key difference is that we have associated our data lake account to Power BI. This is the “bring your own data lake” scenario which means we can actually view the data behind the scenes with a tool such as Azure Storage Explorer (dataflows data is comprised of csv files and the model.json file which contains the schema and other metadata).

With this option we are still using Power Query Online and Power BI for the data refresh, so it’s still very business user-friendly in terms of maintaining the data.

Dataflows_Option2.jpg

Option 2 above is useful when your data analysts like using Power Query Online for data prep, but you still want the ability to access the data independently of Power BI.

Dataflows Option 3: External Dataflows (Managed Outside of Power BI with an Explicit Data Lake)

With the third option, we are maintaining the data with another tool such as Azure Databricks or Azure Data Factory. You could even use something as simple as a text editor if you’d like - Chris Webb talks about that in this blog post. Power BI’s role here is reduced to being a consumer of the data.

Although Power BI doesn’t take responsibility for updating external dataflows, the dataflow can be consumed by PBIX files like any other dataflow. This is great because the user experience for using the dataflow output doesn’t change.

Dataflows_Option3.jpg

Option 3 above is useful when your data lake is part of your strategic data architecture and is integrated with other Azure technologies**. It is more complicated though because you’re also responsible for managing the metadata in the model.json file which accompanies the actual data files. When you have a balance of corporate BI and managed self-service BI needs, a combination of options 2 and 3 are useful.

**Use of dataflows does currently require a filesystem in ADLS Gen2 called “powerbi” so at this point in time (March 2019) we cannot refer the dataflow to just any folder in the data lake. I hope this changes because segregating this data doesn’t lend itself well to a corporate data lake strategy. Also, naming convention junkie that I am…using the name of “powerbi” for the filesystem also doesn’t make 100% sense when there’s other tools involved. I’d rather rely on my normal process of organizing the data lake by subject area and security boundaries.

Summary of the 3 Options

It’s important to be aware that all 3 options can be used simultaneously in the same Power BI tenant.

This table summarizes the above options and (very) loosely correlates them to the types of BI implementations they support:

Dataflows_3Options_Table.jpg

Consider this expanded scenario:

DataflowsInPowerBI.jpg

Workspace A is managed by Power BI.

  • If Workspace A is *not* associated to the dataflows storage, the dataflow functionality works normally but data in the lake is not accessible other than through Power Query Online. This is Option 1.

  • If Workspace A *is* associated to the dataflows storage, it’ll be visible in ADLS Gen2.  This is Option 2.

Workspace B is associated to dataflows storage, and is managed outside of Power BI. This is option 3.

Important Settings in the Power BI Admin Center

To allow dataflows to be utilized by your users (i.e., to use any of the 3 options above), they need to be enabled in the tenant settings:

Dataflows_TenantSettings.jpg
 

To be able to “bring your own data lake” (i.e., to use options 2 or 3), you need to do two things. The first task is to associate your Azure Data Lake Storage Gen2 account to the Power BI tenant:

DataflowsSettings.jpg
 

Note that there are currently (as of March 2019) some pretty big limitations with the above setting:

  1. You can only associate on ADLS Gen2 account for your entire Power BI tenant.

  2. It requires an ADLS Gen2 filesystem called “powerbi” which means we cannot point to just any folder in the data lake.

  3. Once the ADLS Gen2 account is associated to Power BI, it cannot be changed. So be very careful with POCs and such until this becomes more flexible.

The second required task to make options 2 or 3 work is a workspace-specific storage setting for every workspace where dataflows might exist:

Dataflows_WorkspaceStorageSetting.jpg

If the workspace storage setting for dataflows is set to No, then the dataflows in that workspace are using option 1 (which means they still work just fine, you just won’t see the data in your data lake even if you’ve set the data lake account in your tenant settings). If the workspace storage setting is set to Yes, then you are free to use options 2 or 3.

To facilitate the above workspace storage setting, there is one more property to be aware of. In the same pane as where we specify the ADLS Gen2 account, there is a Yes/No toggle to allow workspace admins to assign workspaces to the storage account:

DataflowSettings_WorkspaceOwners.jpg
 

At this time the workspace default for dataflows storage is set to No, so it’s easy to forget — I’m hoping that default changes to Yes if there’s a data lake specified in the tenant settings.

Requirements for ‘Bring Your Own Data Lake’

Here’s a final summary of the requirements to be able to implement option 2 or 3:

Dataflows_RequirementsForBringYourOwnDataLake.jpg

How Permissions Work for a Power BI Service Administrator

A Power BI administrator is a role for managing various aspects of the Power BI Service. This role can be assigned in Office 365. Anyone with Office 365 global admin privileges is also a Power BI administrator by default.

Based on the tests I've been doing, I've observed that users with membership to the Power BI administrator role have two sets of permissions apply:

  • Activities which are scoped across the entire organization

  • Activities for which normal user permissions apply

Within the above 2 categories, I’m thinking there are 4 main types of activities:

  1. Manage tenant settings (always scoped to the organization)

  2. Compile inventory and metadata (can be scoped to the organization)

  3. Manage workspace users (can be scoped to the organization)

  4. Export content from a workspace (relies on user permissions)

PowerBIAdministrator_CategoriesOfResponsibility.jpg

My motivation for laying this out is because initially I expected that all of the PowerShell scripts (including the Export-PowerBIReport cmdlet discussed below) would apply organization-wide. This made me think that the Power BI administrator role had become a highly elevated role which could access all content in the Power BI tenant. However, my initial expectation was wrong in that respect — it turns out that a Power BI administrator can access all metadata but not all of the actual data.

Next let’s briefly review each of the 4 types of activities.

Manage Power BI Tenant Settings

The ability to manage tenant settings in the Power BI Admin Portal, has been in place for some time now. It includes managing things such as:

PowerBIAdminPortalTenantSettings.jpg
  • Tenant settings (for the most part this includes enabling/disabling certain features to influence the user experience and/or govern the system)

  • Capacity

  • Embed codes

  • Organizational custom visuals

The Power BI administrator role cannot be delegated to individual subsets of the organization — it applies to the entire tenant.

The role also cannot be granted in a read-only way. This can make it challenging in a very large organization. For example, let's say you're a large worldwide organization with five main divisions. One of the key Power BI people from division A requests access to the Power BI Admin Portal because they want to be able to view what the settings are. An example I've seen of this is someone thinking that the 'push apps to end users' doesn't work, when really the issue is that it's disabled by default in the tenant settings.

Compile Power BI Inventory and Metadata

With the introduction of the Power BI Management Module, we can more easily run scripts to perform certain activities such as accessing metadata. There are several cmdlets available, for instance: Get-PowerBIDashboard, Get-PowerBIReport, and Get-PowerBIWorkspace.

Here’s an example of a script which looks across the entire tenant (i.e., the organization scope) to find all instances of a report named Product Sales Analysis:

Get-PowerBIReport.jpg

The key point above is that the Power BI administrator can retrieve all metadata like this, including My Workspace for other users. This is actually great because it means a Power BI administrator can put together an inventory of the content in the tenant. If you compare this to usage data from the Office 365 audit log, you could determine what isn’t being used.

Manage Workspace Users

There are PowerShell cmdlets such as Add-PowerBIWorkspaceUser and Remove-PowerBIWorkspaceUser to manage the new type of workspaces (i.e., the V2 ‘new workspace experience’ that is in preview as of this writing in late 2018).

Here is an example of my Power BI administrator account providing member permissions to a colleague:

Add-PowerBIWorkspaceUser.jpg

The interesting part of the above example is that my Power BI administrator account does *not* have any direct permissions to the workspace. However, the organization scope allows it to be done.

Export Power BI Content from Workspaces

There is a PowerShell cmdlet called Export-PowerBIReport which, as the name implies, exports a PBIX from the Power BI Service. This includes the report and the underlying data.

Initially I thought this would also be able to be done with an organization-wide scope, but that’s not true. (Though if that were true, that would open up some interesting scenarios like exporting files to make backups, and/or exporting files to minimize risk that critical data is being delivered from a user's individual workspace…but that’s not possible right now. And if it were, I would want that highly privileged role to be separate somehow from the existing Power BI administrator role.)

Here is an example of exporting one file:

Export-PowerBIReport.jpg

Unlike the previous two examples, Export-PowerBIReport does require the Power BI administrator to have rights to the workspace in order to access the content. This is what I was observing, so I was happy to have it confirmed by Chaz Beck (CodeCyclone) from the Power BI product team when he replied to this GitHub issue. This is a legitimate unauthorized (401) message when a Power BI administrator tries to export a PBIX that resides somewhere that the administrator doesn’t have access to - this includes My Workspace for all other users.

One limitation to this: reports and dashboards created directly in the Power BI Service cannot be exported in this manner, which means this method cannot be used for migrating content across tenants.

Also, this cmdlet appears to only work with the new workspace experience that is currently in public preview. I get an unauthorized (401) message, even if I’m an admin on the workspace itself, if I try this on a V1 workspace. This is confirmed as a bug on this GitHub issue.

Summary

Hopefully this post saves you some time in determining how permissions apply to the different types of activities that a Power BI administrator can do.

 
PowerBIAdministrator.jpg
 

The scripts shown above are super simplified, not really ready for actual production use, but I kept them simple since the focus of this post is on permissions.

Also, keep in mind that you do *not* have to be a designated Power BI administrator to use the Power BI PowerShell cmdlets — any user can run them related to their own content. However, you *do* need to be a Power BI administrator in order to set the scope to organization (for those cmdlets which support it).

To find additional information:

You Might Also Like

Terminology Check - What is a Power BI App?

Lesson Learned - Keep PowerShell Modules Consistent and Up To Date

Checklist for Finalizing a Model in Power BI Desktop

Terminology Check - What are Data Flows?

It's another terminology post! Earlier this week I was having a delightful lunch with Angela Henry, Kevin Feasel, Javier Guillen, and Jason Thomas. We were chatting about various new things. Partway thru our conversation Jason stops me because he thought I was talking about Power BI Dataflows when I was really talking about Azure Data Factory Data Flows. It was kind of a funny moment actually but it did illustrate that we have some overlapping terminology coming into our world.

So, with that inspiration, let's have a chat about some of the new data flow capabilities in the Microsoft world, shall we?

Azure Data Factory Data Flow

The new Azure Data Factory (ADF) Data Flow capability is analogous to those from SSIS: a data flow allows you to build data transformation logic using a graphical interface. A really interesting aspect about ADF Data Flows is that they use Azure Databricks as the runtime engine underneath -- however, you don't actually have to know Spark or Databricks in order to be able to use ADF Data Flows. The goal is for it to be a low code/no code way to transform data at scale.

Follow Mark Kromer and the ADF team on Twitter to stay up to date on the rollout of the preview.

 

More info on ADF Data Flow can be found here: https://aka.ms/adfdataflowdocs

Power BI Dataflows

Power BI Dataflows (yes, this one is branded as one word) are a new type of object in a Power BI Workspace which will allow you to load data into a Common Data Model. Data is loaded via a web-based version of Power Query, which is why this capability is referred to as self-service data prep. The resulting data is stored in Azure Data Lake Storage Gen 2. Once in the Common Data Model in the data lake, it can be reused among various Power BI datasets -- allowing the data load, transformations, and cleansing to be done once rather than by numerous PBIX files. This capability was known for a little while during the private preview as Power BI Datapools or as 'Common Data Service for Analytics' (CDS-A) -- but the final name looks like it's going to be Power BI Dataflows.

It's still early so there's not a lot of info available online yet. James Serra wrote up a nice summary and has a few links on his blog. Also, here's a diagram that Chris and I included in the recently updated whitepaper Planning a Power BI Enterprise Deployment which shows our initial understanding of the Power BI Dataflows capability:

Note that Pro users can use Power BI Dataflows without requiring Premium. However, my hunch is that this capabililty will be most appealing for data at scale - i.e., the features that Premium offers with respect to Power BI Dataflows will be pretty compelling, which is why Premium is depicted in the diagram above.

SSIS Data Flow

Data flows have long been a key part of SQL Server Integration Services (SSIS) for data transformations, just like the new capability being added to ADF discussed above. As of Azure Data Factory V2, we can also host and execute SSIS packages in Azure from ADF V2. 

Microsoft Flow

Just for completeness I'll cover one more product which is similarly named. Flow is an Office 365 service for workflow automation between services. It can be used in conjunction with PowerApps and Power BI for different types of workflow automation. Flow lets you do things like approval requests, sending an e-mail alert, or creating a task in a project management system.

Now you know there are multiple types data flows being launched into the world of Microsoft BI (in addition to the good old SSIS data flows we've had forever). Now you can cleverly watch out for which one is being bantered about in your techie conversations. 

You Might Also Like...

Three Ways to Use Power BI Dataflows

Terminology Check - What is a Power BI App?