Three Ways to Use Power BI Dataflows

Dataflows in Power BI are a really interesting capability for centralizing and reusing Power Query logic among many different PBIX files. If you’ve found yourself copying M scripts (Power Query formula language) between files, or creating a library of M scripts then you should look into dataflows.

I won’t rehash the basic overview info here because it has been well covered by other bloggers. In particular, Matthew Roche has been writing a lot of good stuff:

This post is to focus on what I’m referring to as ‘the three ways to use Power BI dataflows.’ I talked about this on the March BlueGranite Office Hours session on Power BI Dataflows (video recording | slides).

Dataflows1.jpg

Dataflows Option 1: Fully Managed by Power BI

In this first option, Power BI handles everything. We use the web-based Power Query Online tool for structuring the data. Power BI handles scheduling the data refresh.

The underlying data behind the dataflow is stored in a data lake. However, since it’s fully managed, this data lake is not directly accessible or visible to the customer. As with most cloud-based implementations, the infrastructure is hidden under the covers. This is what is happening if your users are utilizing dataflows currently but you haven’t specified a data lake account in the Power BI admin center.

Dataflows_Option1.jpg

Option 1 above is useful when all of your data preparation is done by analysts and Power BI is the only tool which accesses that data.

Dataflows Option 2: Managed by Power BI with an Explicit Data Lake

In this second option, the diagram looks exactly the same as option 1. The key difference is that we have associated our data lake account to Power BI. This is the “bring your own data lake” scenario which means we can actually view the data behind the scenes with a tool such as Azure Storage Explorer (dataflows data is comprised of csv files and the model.json file which contains the schema and other metadata).

With this option we are still using Power Query Online and Power BI for the data refresh, so it’s still very business user-friendly in terms of maintaining the data.

Dataflows_Option2.jpg

Option 2 above is useful when your data analysts like using Power Query Online for data prep, but you still want the ability to access the data independently of Power BI.

Dataflows Option 3: External Dataflows (Managed Outside of Power BI with an Explicit Data Lake)

With the third option, we are maintaining the data with another tool such as Azure Databricks or Azure Data Factory. You could even use something as simple as a text editor if you’d like - Chris Webb talks about that in this blog post. Power BI’s role here is reduced to being a consumer of the data.

Although Power BI doesn’t take responsibility for updating external dataflows, the dataflow can be consumed by PBIX files like any other dataflow. This is great because the user experience for using the dataflow output doesn’t change.

Dataflows_Option3.jpg

Option 3 above is useful when your data lake is part of your strategic data architecture and is integrated with other Azure technologies**. It is more complicated though because you’re also responsible for managing the metadata in the model.json file which accompanies the actual data files. When you have a balance of corporate BI and managed self-service BI needs, a combination of options 2 and 3 are useful.

**Use of dataflows does currently require a filesystem in ADLS Gen2 called “powerbi” so at this point in time (March 2019) we cannot refer the dataflow to just any folder in the data lake. I hope this changes because segregating this data doesn’t lend itself well to a corporate data lake strategy. Also, naming convention junkie that I am…using the name of “powerbi” for the filesystem also doesn’t make 100% sense when there’s other tools involved. I’d rather rely on my normal process of organizing the data lake by subject area and security boundaries.

Summary of the 3 Options

It’s important to be aware that all 3 options can be used simultaneously in the same Power BI tenant.

This table summarizes the above options and (very) loosely correlates them to the types of BI implementations they support:

Dataflows_3Options_Table.jpg

Consider this expanded scenario:

DataflowsInPowerBI.jpg

Workspace A is managed by Power BI.

  • If Workspace A is *not* associated to the dataflows storage, the dataflow functionality works normally but data in the lake is not accessible other than through Power Query Online. This is Option 1.

  • If Workspace A *is* associated to the dataflows storage, it’ll be visible in ADLS Gen2.  This is Option 2.

Workspace B is associated to dataflows storage, and is managed outside of Power BI. This is option 3.

Important Settings in the Power BI Admin Center

To allow dataflows to be utilized by your users (i.e., to use any of the 3 options above), they need to be enabled in the tenant settings:

Dataflows_TenantSettings.jpg
 

To be able to “bring your own data lake” (i.e., to use options 2 or 3), you need to do two things. The first task is to associate your Azure Data Lake Storage Gen2 account to the Power BI tenant:

DataflowsSettings.jpg
 

Note that there are currently (as of March 2019) some pretty big limitations with the above setting:

  1. You can only associate on ADLS Gen2 account for your entire Power BI tenant.

  2. It requires an ADLS Gen2 filesystem called “powerbi” which means we cannot point to just any folder in the data lake.

  3. Once the ADLS Gen2 account is associated to Power BI, it cannot be changed. So be very careful with POCs and such until this becomes more flexible.

The second required task to make options 2 or 3 work is a workspace-specific storage setting for every workspace where dataflows might exist:

Dataflows_WorkspaceStorageSetting.jpg

If the workspace storage setting for dataflows is set to No, then the dataflows in that workspace are using option 1 (which means they still work just fine, you just won’t see the data in your data lake even if you’ve set the data lake account in your tenant settings). If the workspace storage setting is set to Yes, then you are free to use options 2 or 3.

To facilitate the above workspace storage setting, there is one more property to be aware of. In the same pane as where we specify the ADLS Gen2 account, there is a Yes/No toggle to allow workspace admins to assign workspaces to the storage account:

DataflowSettings_WorkspaceOwners.jpg
 

At this time the workspace default for dataflows storage is set to No, so it’s easy to forget — I’m hoping that default changes to Yes if there’s a data lake specified in the tenant settings.

Requirements for ‘Bring Your Own Data Lake’

Here’s a final summary of the requirements to be able to implement option 2 or 3:

Dataflows_RequirementsForBringYourOwnDataLake.jpg

How Permissions Work for a Power BI Service Administrator

A Power BI administrator is a role for managing various aspects of the Power BI Service. This role can be assigned in Office 365. Anyone with Office 365 global admin privileges is also a Power BI administrator by default.

Based on the tests I've been doing, I've observed that users with membership to the Power BI administrator role have two sets of permissions apply:

  • Activities which are scoped across the entire organization

  • Activities for which normal user permissions apply

Within the above 2 categories, I’m thinking there are 4 main types of activities:

  1. Manage tenant settings (always scoped to the organization)

  2. Compile inventory and metadata (can be scoped to the organization)

  3. Manage workspace users (can be scoped to the organization)

  4. Export content from a workspace (relies on user permissions)

PowerBIAdministrator_CategoriesOfResponsibility.jpg

My motivation for laying this out is because initially I expected that all of the PowerShell scripts (including the Export-PowerBIReport cmdlet discussed below) would apply organization-wide. This made me think that the Power BI administrator role had become a highly elevated role which could access all content in the Power BI tenant. However, my initial expectation was wrong in that respect — it turns out that a Power BI administrator can access all metadata but not all of the actual data.

Next let’s briefly review each of the 4 types of activities.

Manage Power BI Tenant Settings

The ability to manage tenant settings in the Power BI Admin Portal, has been in place for some time now. It includes managing things such as:

PowerBIAdminPortalTenantSettings.jpg
  • Tenant settings (for the most part this includes enabling/disabling certain features to influence the user experience and/or govern the system)

  • Capacity

  • Embed codes

  • Organizational custom visuals

The Power BI administrator role cannot be delegated to individual subsets of the organization — it applies to the entire tenant.

The role also cannot be granted in a read-only way. This can make it challenging in a very large organization. For example, let's say you're a large worldwide organization with five main divisions. One of the key Power BI people from division A requests access to the Power BI Admin Portal because they want to be able to view what the settings are. An example I've seen of this is someone thinking that the 'push apps to end users' doesn't work, when really the issue is that it's disabled by default in the tenant settings.

Compile Power BI Inventory and Metadata

With the introduction of the Power BI Management Module, we can more easily run scripts to perform certain activities such as accessing metadata. There are several cmdlets available, for instance: Get-PowerBIDashboard, Get-PowerBIReport, and Get-PowerBIWorkspace.

Here’s an example of a script which looks across the entire tenant (i.e., the organization scope) to find all instances of a report named Product Sales Analysis:

Get-PowerBIReport.jpg

The key point above is that the Power BI administrator can retrieve all metadata like this, including My Workspace for other users. This is actually great because it means a Power BI administrator can put together an inventory of the content in the tenant. If you compare this to usage data from the Office 365 audit log, you could determine what isn’t being used.

Manage Workspace Users

There are PowerShell cmdlets such as Add-PowerBIWorkspaceUser and Remove-PowerBIWorkspaceUser to manage the new type of workspaces (i.e., the V2 ‘new workspace experience’ that is in preview as of this writing in late 2018).

Here is an example of my Power BI administrator account providing member permissions to a colleague:

Add-PowerBIWorkspaceUser.jpg

The interesting part of the above example is that my Power BI administrator account does *not* have any direct permissions to the workspace. However, the organization scope allows it to be done.

Export Power BI Content from Workspaces

There is a PowerShell cmdlet called Export-PowerBIReport which, as the name implies, exports a PBIX from the Power BI Service. This includes the report and the underlying data.

Initially I thought this would also be able to be done with an organization-wide scope, but that’s not true. (Though if that were true, that would open up some interesting scenarios like exporting files to make backups, and/or exporting files to minimize risk that critical data is being delivered from a user's individual workspace…but that’s not possible right now. And if it were, I would want that highly privileged role to be separate somehow from the existing Power BI administrator role.)

Here is an example of exporting one file:

Export-PowerBIReport.jpg

Unlike the previous two examples, Export-PowerBIReport does require the Power BI administrator to have rights to the workspace in order to access the content. This is what I was observing, so I was happy to have it confirmed by Chaz Beck (CodeCyclone) from the Power BI product team when he replied to this GitHub issue. This is a legitimate unauthorized (401) message when a Power BI administrator tries to export a PBIX that resides somewhere that the administrator doesn’t have access to - this includes My Workspace for all other users.

One limitation to this: reports and dashboards created directly in the Power BI Service cannot be exported in this manner, which means this method cannot be used for migrating content across tenants.

Also, this cmdlet appears to only work with the new workspace experience that is currently in public preview. I get an unauthorized (401) message, even if I’m an admin on the workspace itself, if I try this on a V1 workspace. This is confirmed as a bug on this GitHub issue.

Summary

Hopefully this post saves you some time in determining how permissions apply to the different types of activities that a Power BI administrator can do.

 
PowerBIAdministrator.jpg
 

The scripts shown above are super simplified, not really ready for actual production use, but I kept them simple since the focus of this post is on permissions.

Also, keep in mind that you do *not* have to be a designated Power BI administrator to use the Power BI PowerShell cmdlets — any user can run them related to their own content. However, you *do* need to be a Power BI administrator in order to set the scope to organization (for those cmdlets which support it).

To find additional information:

You Might Also Like

Terminology Check - What is a Power BI App?

Lesson Learned - Keep PowerShell Modules Consistent and Up To Date

Checklist for Finalizing a Model in Power BI Desktop

Terminology Check - What are Data Flows?

It's another terminology post! Earlier this week I was having a delightful lunch with Angela Henry, Kevin Feasel, Javier Guillen, and Jason Thomas. We were chatting about various new things. Partway thru our conversation Jason stops me because he thought I was talking about Power BI Dataflows when I was really talking about Azure Data Factory Data Flows. It was kind of a funny moment actually but it did illustrate that we have some overlapping terminology coming into our world.

So, with that inspiration, let's have a chat about some of the new data flow capabilities in the Microsoft world, shall we?

Azure Data Factory Data Flow

The new Azure Data Factory (ADF) Data Flow capability is analogous to those from SSIS: a data flow allows you to build data transformation logic using a graphical interface. A really interesting aspect about ADF Data Flows is that they use Azure Databricks as the runtime engine underneath -- however, you don't actually have to know Spark or Databricks in order to be able to use ADF Data Flows. The goal is for it to be a low code/no code way to transform data at scale.

Follow Mark Kromer and the ADF team on Twitter to stay up to date on the rollout of the preview.

 

More info on ADF Data Flow can be found here: https://aka.ms/adfdataflowdocs

Power BI Dataflows

Power BI Dataflows (yes, this one is branded as one word) are a new type of object in a Power BI Workspace which will allow you to load data into a Common Data Model. Data is loaded via a web-based version of Power Query, which is why this capability is referred to as self-service data prep. The resulting data is stored in Azure Data Lake Storage Gen 2. Once in the Common Data Model in the data lake, it can be reused among various Power BI datasets -- allowing the data load, transformations, and cleansing to be done once rather than by numerous PBIX files. This capability was known for a little while during the private preview as Power BI Datapools or as 'Common Data Service for Analytics' (CDS-A) -- but the final name looks like it's going to be Power BI Dataflows.

It's still early so there's not a lot of info available online yet. James Serra wrote up a nice summary and has a few links on his blog. Also, here's a diagram that Chris and I included in the recently updated whitepaper Planning a Power BI Enterprise Deployment which shows our initial understanding of the Power BI Dataflows capability:

Note that Pro users can use Power BI Dataflows without requiring Premium. However, my hunch is that this capabililty will be most appealing for data at scale - i.e., the features that Premium offers with respect to Power BI Dataflows will be pretty compelling, which is why Premium is depicted in the diagram above.

SSIS Data Flow

Data flows have long been a key part of SQL Server Integration Services (SSIS) for data transformations, just like the new capability being added to ADF discussed above. As of Azure Data Factory V2, we can also host and execute SSIS packages in Azure from ADF V2. 

Microsoft Flow

Just for completeness I'll cover one more product which is similarly named. Flow is an Office 365 service for workflow automation between services. It can be used in conjunction with PowerApps and Power BI for different types of workflow automation. Flow lets you do things like approval requests, sending an e-mail alert, or creating a task in a project management system.

Now you know there are multiple types data flows being launched into the world of Microsoft BI (in addition to the good old SSIS data flows we've had forever). Now you can cleverly watch out for which one is being bantered about in your techie conversations. 

You Might Also Like...

Terminology Check - What is a Power BI App?

 

Terminology Check - What is a Power BI App?

Thumbnail.jpg

Let's say you just heard someone mention a Power BI app. What exactly do they mean by that? Well, it depends. The term "app" is used kind of a lot in the Power BI world. So, here's a quick reference to help you decode the conversation. I'm going to start with the most likely options, working down to other options. Which one someone is referring to really depends on their role and their level of familiarity with the Power BI ecosystem.

Power BI App

A Power BI App is a packaged up set of content in the web-based Power BI Service. Related reports, workbooks, dashboards, and datasets are published from an App Workspace into an App for users to consume. 

Power BI App Workspace

An App Workspace in the Power BI Service is where reports, workbooks, dashboards, and datasets are saved, and where data refresh schedules and other settings are defined. An App Workspace is suited to development & collaboration with coworkers (whereas My Workspace is a private area). Smaller teams might do everything they need to do within an App Workspace, whereas larger teams use an App Workspace as the collaboration area for content before it gets published to a Power BI App for consumption. You can have quite a few App Workspaces, depending on how you organize content (for instance, by subject area, by project, by department, or by type of analysis). 

Power BI Mobile App

There are iOS, Android, and Windows mobile apps for consuming Power BI content. In addition to displaying content from the Power BI Service, the mobile apps can also display content from SQL Server Reporting Services and Power BI Report Server. 

Power BI Desktop Application

Power BI Desktop is a client application which is installed on a user's PC. Its purpose is for creating queries, data models, relationships, calculations, and reports for Power BI. Power BI Desktop can be downloaded from the web. However, it's recommended to use the Windows Store instead because updates are installed automatically, even if you don't have admin rights on your machine. The automatic updates are very helpful because Power BI Desktop is updated once per month, as well as bug fixes here and there.

PowerApps

There are three tools in the Business Applications Group currently: Power BI, Flow, and PowerApps. PowerApps is an Office 365 feature that allows you to pretty easily build line-of-business applications with low code or no code. There are lots of possibilities for integration between these three products. For instance, you can display a Power BI report in a PowerApps app, or you can display a PowerApps input screen within a Power BI dashboard, or you can have a Power BI alert trigger a Flow which causes something else to happen in a workflow. 

AppSource

AppSource is like a marketplace to search for line-of-business applications for Power BI, Office 365, Dynamics 365, as well as other products and services. Published offerings can be specific to your organization (such as a Power BI App discussed above), from third parties (like Salesforce), or from partner companies (such as my employer, BlueGranite). 

Azure Active Directory App

If you intend to use Power BI Embedded, you'll need to register an AAD App. This AAD App will be used as the master account (like a service account). This account will have a Power BI license and will have permissions assigned to it to manage the Power BI tenance via the Power BI REST APIs. 

You Might Also Like...

Checklist for Finalizing a Data Model in Power BI Desktop

Why the Default Summarization Property in Power BI is So Important