Terminology Check - What is a Power BI App?

Thumbnail.jpg

Let's say you just heard someone mention a Power BI app. What exactly do they mean by that? Well, it depends. The term "app" is used kind of a lot in the Power BI world. So, here's a quick reference to help you decode the conversation. I'm going to start with the most likely options, working down to other options. Which one someone is referring to really depends on their role and their level of familiarity with the Power BI ecosystem.

Power BI App

A Power BI App is a packaged up set of content in the web-based Power BI Service. Related reports, workbooks, dashboards, and datasets are published from an App Workspace into an App for users to consume. 

Power BI App Workspace

An App Workspace in the Power BI Service is where reports, workbooks, dashboards, and datasets are saved, and where data refresh schedules and other settings are defined. An App Workspace is suited to development & collaboration with coworkers (whereas My Workspace is a private area). Smaller teams might do everything they need to do within an App Workspace, whereas larger teams use an App Workspace as the collaboration area for content before it gets published to a Power BI App for consumption. You can have quite a few App Workspaces, depending on how you organize content (for instance, by subject area, by project, by department, or by type of analysis). 

Power BI Mobile App

There are iOS, Android, and Windows mobile apps for consuming Power BI content. In addition to displaying content from the Power BI Service, the mobile apps can also display content from SQL Server Reporting Services and Power BI Report Server. 

Power BI Desktop Application

Power BI Desktop is a client application which is installed on a user's PC. Its purpose is for creating queries, data models, relationships, calculations, and reports for Power BI. Power BI Desktop can be downloaded from the web. However, it's recommended to use the Windows Store instead because updates are installed automatically, even if you don't have admin rights on your machine. The automatic updates are very helpful because Power BI Desktop is updated once per month, as well as bug fixes here and there.

PowerApps

There are three tools in the Business Applications Group currently: Power BI, Flow, and PowerApps. PowerApps is an Office 365 feature that allows you to pretty easily build line-of-business applications with low code or no code. There are lots of possibilities for integration between these three products. For instance, you can display a Power BI report in a PowerApps app, or you can display a PowerApps input screen within a Power BI dashboard, or you can have a Power BI alert trigger a Flow which causes something else to happen in a workflow. 

AppSource

AppSource is like a marketplace to search for line-of-business applications for Power BI, Office 365, Dynamics 365, as well as other products and services. Published offerings can be specific to your organization (such as a Power BI App discussed above), from third parties (like Salesforce), or from partner companies (such as my employer, BlueGranite). 

You Might Also Like...

Checklist for Finalizing a Data Model in Power BI Desktop

Why the Default Summarization Property in Power BI is So Important

Including File Properties and Metadata in a U-SQL Script

When working on big data systems, it can be very helpful to include file properties and other metadata directly within the data results. Capturing data lineage can come in very handy, especially if reconciling or troubleshooting issues (for instance, if retry logic occurred in the data stream and now you have duplicate rows to be handled).

I just learned we have some new U-SQL syntax which supports the following file properties:

  • URI (uniform resource identifier)
  • Modified date
  • Created date
  • Length (file size in bytes)

In the following example, I'm using U-SQL (Azure Data Lake Analytics) to iterate over files which are in date-partitioned subfolders under Raw Data within Azure Data Lake Store. As part of the schema-on-read definition of the source files (aka the extract statement), the new file properties are shown in yellow:

U-SQL_File_Properties.jpg

The output for the virtual columns looks like this:

U-SQL_File_Properties_Output.jpg

You can find more info about this in the release notes on GitHub

Like This Content?

If you are integrating data between Azure services, you might be interested in an all-day session Meagan Longoria and I are presenting at PASS Summit in November. It's called "Designing Modern Data and Analytics Solutions in Azure." Check out info here: http://www.pass.org/summit/2018/Sessions/Details.aspx?sid=78885 

You Might Also Like...

Querying Data in Azure Data Lake Store with Power BI

Granting Permissions in Azure Data Lake

Zones in a Data Lake

Two Ways to Approach Federated Queries with U-SQL and ADLA

Querying Data in Azure Data Lake Store with Power BI

Let's say you have data in Azure Data Lake Store (ADLS) that you want to report directly from in Power BI. You might be doing this for early exploratory data efforts, or you might have some curated data which has been prepared in the data lake for analysis tools such as Power BI. 

The screen shots & functionality discussed for ADLS and Power BI are from early May 2018.

In the example I'm using, my Data Lake Store service is called "bankingadls." The file structure I'm working with looks like this:

ADLS_FileStructure.jpg

We're going to be pulling the data files from the RawData subfolders into Power BI.

What do you need to be able to retrieve ADLS data into Power BI?

  • Read and execute permissions to the data stored in ADLS. I talked about permissions in this blog post
  • The source file(s) need a format which has reliable structure that Power BI can understand.
  • If you're combining multiple files (such as all 3 shown in the screenshot above), they need to have the same structure/file layout for each file. Power BI is pretty smart about this, so if you're missing a column in one file that's present in the other files, it can cope with something simple like that from what I've seen. The rule of thumb when designing a data lake is to use the same data structure within each folder, so hopefully varying data formats aren't a problem--otherwise you'll have to cleanse & standardize the data before it can be analyzed. 

What can you query from ADLS?

You can connect to the data stored in Azure Data Lake Store. What you *cannot* connect to currently is the data stored in the Catalog tables/views/stored procedures within Azure Data Lake Analytics (hopefully connectivity to the ADLA Catalog objects from tools other than U-SQL is available soon--you can vote for Power BI connectivity to the Catalog on this UserVoice suggestion).

You're not sending a U-SQL query here. Rather, we're sending a web API request to an endpoint.

With an ADLS data source, you have to import the data into Power BI Desktop. There is no option for DirectQuery.

Should you use https:// or adl:// to connect to ADLS

In the portal you may notice that you have two ways to connect:

ADLS_URLandURI.jpg

https:// is a WebHDFS-compatible endpoint which allows integration of the ADL Store with open source projects and other languages.

adl:// is the “AzureDataLakeFilesystem” which is also secure, and provides additional performance enhancements on top of WebHDFS. 

I typically use the adl:// endpoint in case it does give me a performance boost. From what I can tell in Fiddler, it looks like both methods send the same GET request from Power BI which looks like this:

ADLS_fromPowerBI_GetRequest.jpg

Connecting to one file in ADLS from Power BI

Connecting to one individual file in ADLS is pretty straightforward. In the Get Data area within Power BI Desktop, we'll choose the Azure Data Lake Store connector:

PowerBI_ADLS.jpg
 

When prompted for the URL, you can put in either the URI or the URL - either will work. For one file, you'll use the full path to that file:

URL_ADLS_OneFile.jpg

Go ahead and respond to the security dialog box when prompted (if you're accessing this ADL Store for the first time). Next let's choose Edit to go into the Query Editor:

PowerBI_ADLS2.jpg

Here's where it might look funny to you at first. What it's actually showing you here is the metadata. To get to the data, click on the hyperlink called Binary:

PowerBI_ADLSData.jpg

And now we've got a preview of the actual file data. Power BI auto-created a few steps to tidy up the data:

PowerBI_ADLSData2.jpg

At this point, you can Close & Apply. Carry on with fine-tuning the data model properties and creating visuals. 

Connecting to a folder containing multiple files from Power BI

Connecting to a folder is helpful when you have numerous files you want to consolidate. In a data lake, it's common to have data partitioned (into subfolders) by date or other logical ways to store the data incrementally. 

There are a couple of differences when retrieving data from the folder level instead. The first difference is we specify the folder path in the URL:

URL_ADLS_Folder.jpg

The next difference is how the metadata is initially presented. The metadata for this "table" is actually the subfolder - for us, that's the 2017 subfolder under RawData. If we click on the Table hyperlink, we are taken to the metadata for the monthly folders:

PowerBI_ADLS3.jpg

At this point, we don't want to navigate down any more because we want files that are present in each of those monthly folders. What we want to do is click on the "Expand" button:

PowerBI_ADLS_CombineFiles.jpg
 

Your steps above might differ a bit depending on your folder structure. 

Now we see that it has detected the presence of the individual files across the multiple folders. Now it's time to click on the "Combine Files" button:

PowerBI_ADLS_CombineFiles4.jpg

The Query Editor uses the first file to determine what the column structure of the data will be:

PowerBI_ADLS_CombineFiles3.jpg

When combining files, be sure to choose "Skip files with errors." I had some null rows along with the binary file list which Power BI saw as file errors so it failed when trying to complete a full data refresh. The error was "Failed to save modifications to the server. The parameter is expected to be of type Text.Type or Binary.Type." I believe what this meant was the sample query & parameter it used in the Query Editor to figure out the metadata was choking on those null values. I'm not certain where the null rows came from, but choosing "Skip files with errors" solved the issue (or you could filter out null values in the Content.Content column shown above).

At this point, you can Close & Apply. Carry on with fine-tuning the data model properties and creating visuals. 

Refreshing the data in the Power BI Service

In order to schedule a data refresh, we'll want to publish the PBIX to the Power BI Service. (Reminder: if you plan to share this with others, lean towards using an app workspace. Sharing out of your personal workspace should be used only in a limited way.)

A gateway is not needed in the Power BI Service since it's pulling from an Azure service.

The data source credentials for the refresh use the OAuth2 protocol which is common for web requests:

PowerBI_ADLS_DataRefresh.jpg
 

Now for the bad news. The OAuth2 protocol relies on access tokens, and those tokens expire at regular intervals. (I'm currently testing this with a new refresh schedule created yesterday; I'll update this post as soon as I have verified what the current expiration interval is--thankfully it is longer than it used to be.) Once the refresh fails due to token expiration, you'll have to edit credentials in the dataset properties. Usually our way around this issue, like when Azure Data Factory needs to access ADLS, is to use an Azure application (service principal) for authentication, but that's not currently supported either. 

Depending on your quantity of files and/or size of files in the data lake, the data refresh may take a bit of time. One https GET request is sent by Power BI per file to retrieve data. 

Like This Content?

If you are integrating data between Azure services, you might be interested in an all-day session Meagan Longoria and I are presenting at PASS Summit in November. It's called "Designing Modern Data and Analytics Solutions in Azure." Check out info here: http://www.pass.org/summit/2018/Sessions/Details.aspx?sid=78885 

You Might Also Like...

Why the Default Summarization Property in Power BI is So Important

Checklist for Finalizing a Data Model in Power BI Desktop

Data Refresh Issues in the Power BI Service Due to Invalid Dates

New All-Day Session: Designing Modern Data and Analytics Solutions in Azure

At the fall 2018 PASS Summit in Seattle, I'm excited to be co-presenting a full day pre-conference session with my good friend & fellow BlueGranite colleague Meagan Longoria

Why Do a Pre-Con?

MelissaCoates.JPG

I have a natural inclination to share information that I have learned. Being a hands-on techie is something I absolutely love, but I have a bit of educator in my blood as well. And, continually learning new skills is at the core of what makes me happy. All of which means that I aim to teach others in a way that I would want to learn. 

What Will You Learn? 

This session will very much be about planning the architecture and factors around decision-making, presented in a very practical and realistic way (full abstract can be found here). We will build the components for one reference architecture, using scripts that we will provide you. 

PreconTime.jpg
 

The full abstract can be found on the PASS Summit site. To highlight just a few of the topics that you'll hear about:

  • Going to the cloud - What's easier? What's harder? What trade-offs can you expect to make with respect to cost, control, complexity, performance, and security?
  • Cloud design patterns - In what ways are cloud design patterns different from traditional on-premises solutions? How does that change the typical roles for developers and administrators?
  • Schema-on-read - In what scenarios does schema-on-read work extremely well? In which situations is it not ideal?
  • Patterns-based development - What automation techniques can save you time, improve efficiency, and reduce the chance for error? 
  • Architecture - What does a BI/analytics/DW architecture look like if we value the concept of polyglot persistence vs. architectural simplicity? What kind of differences should we be aware of if we are using a distributed architecture? What are the Azure options for supporting data science and self-service BI?
  • Data storage - When do we want to analyze data in place vs. move it to another data store? What technology options do we have in Azure, and what factors do we want to consider for deciding between data virtualization and data integration? In what cases can you take advantage of a data lake in your architecture? 

Who is the Target Audience?

The ideal audience member has some experience as a data engineer, BI professional, or database developer, and is in the early stages of migrating or building solutions in Azure.

This session is broad because the data platform offerings in Azure are broad with many choices and considerations. Our day job *is* planning and building data solutions in Azure. Meagan and I are very excited to help you get started with building a solid data architecture in Azure.

More details and to register: Designing Modern Data and Analytics Solutions in Azure