Lesson Learned - Keep PowerShell Modules Consistent and Up To Date

PowerShellLogo.jpg

This is a quick post to share something that happened on a project recently. We began to experience some intermittent issues with Azure Data Factory (V1) and it was ultimately related to an out-of-date AzureRM PowerShell module. What does ADF have to do with the AzureRM PowerShell module you ask? For this project, the nightly loads are triggered by a signal file which indicates the source data is ready. Therefore, we have a PowerShell script that controls the whole end-to-end process (which overrides the ADF V1 built-in scheduling). The PowerShell script looks for the signal file and then proceeds to resume the Azure Data Factory pipelines (via the Resume-AzureRmDataFactoryPipeline cmdlet). The resume causes each pipeline to execute immediately. When all pipelines are finished, they all get suspended again (via the Suspend-AzureRmDataFactoryPipeline cmdlet) until the next execution of the data load process. This PowerShell process runs on a virtual machine.

In production recently, we started seeing cases of ADF pipelines that wouldn't resume properly which resulted in some data not getting loaded properly. It was inconsistent and intermittent. With the exact same PowerShell script, we couldn't reproduce the issue in UAT. The issues were only occurring in production, but not with precise regularity. 

My colleague on the project, Terry Crist, did some investigation and found that the AzureRM module installed on UAT was much newer than what was installed on production. Sure enough, once the AzureRM module was updated in production everything began to run reliably again.

So, this served as a good reminder to ensure that (a) UAT and production environments should be running on the same PowerShell module version, and (b) preferably the latest version should be installed when possible. In environments which don't have a full-time DBA looking after this sort of thing it's good for developers to know to watch out for these types of issues too. 

You Might Also Like...

Find Pipelines Currently Running in Azure Data Factory with PowerShell

PowerShell for Assigning and Querying Tags in Azure

New eBook - Data Lakes in a Modern Data Architecture

This is a quick announcement of a new resource available published to the BlueGranite site.

If you're interested in data lakes, you might want to check out an updated ebook just published to the BlueGranite site. It's called "Data Lakes in a Modern Data Architecture." This ebook was originally published about 3 years ago by Chris Campbell. If you saw the original, you'll note we retained the same 'look and feel' but about 90% of the content has been updated.

DataLakeVsDataWarehouse.jpg
 

I wrote the updated content from a practical point of view, totally hype-free. The table of contents:

  • Modern Data Architecture
  • Business Needs Driving Data Architectures to Evolve and Adapt
  • Principles of a Modern Data Architecture
  • Data Lake + Data Warehouse: Complementary Solutions
  • Tips for Designing a Data Lake
  • Azure Technologies for Implementing a Data Lake
  • Considerations for a Successful Data Lake in the Cloud
  • Getting Started with a Data Lake

To download the ebook, BlueGranite will ask for you to register your information. That's common for premium content like this. We take a low-key approach to sales, so I can assure you that registration only means you'll receive notifications of new content that you may find interesting.

I'll be updating the ebook from time to time. For example, it already needs updating to reflect the new changes in Azure Data Lake Store Gen 2. 

At 23 pages, the ebook just begins to explore a lot of considerations. To see if we can help you dive in deeper (pun intended), please contact us. I hope you enjoy reading the ebook as much as I enjoyed writing it.

Terminology Check - What is a Power BI App?

Thumbnail.jpg

Let's say you just heard someone mention a Power BI app. What exactly do they mean by that? Well, it depends. The term "app" is used kind of a lot in the Power BI world. So, here's a quick reference to help you decode the conversation. I'm going to start with the most likely options, working down to other options. Which one someone is referring to really depends on their role and their level of familiarity with the Power BI ecosystem.

Power BI App

A Power BI App is a packaged up set of content in the web-based Power BI Service. Related reports, workbooks, dashboards, and datasets are published from an App Workspace into an App for users to consume. 

Power BI App Workspace

An App Workspace in the Power BI Service is where reports, workbooks, dashboards, and datasets are saved, and where data refresh schedules and other settings are defined. An App Workspace is suited to development & collaboration with coworkers (whereas My Workspace is a private area). Smaller teams might do everything they need to do within an App Workspace, whereas larger teams use an App Workspace as the collaboration area for content before it gets published to a Power BI App for consumption. You can have quite a few App Workspaces, depending on how you organize content (for instance, by subject area, by project, by department, or by type of analysis). 

Power BI Mobile App

There are iOS, Android, and Windows mobile apps for consuming Power BI content. In addition to displaying content from the Power BI Service, the mobile apps can also display content from SQL Server Reporting Services and Power BI Report Server. 

Power BI Desktop Application

Power BI Desktop is a client application which is installed on a user's PC. Its purpose is for creating queries, data models, relationships, calculations, and reports for Power BI. Power BI Desktop can be downloaded from the web. However, it's recommended to use the Windows Store instead because updates are installed automatically, even if you don't have admin rights on your machine. The automatic updates are very helpful because Power BI Desktop is updated once per month, as well as bug fixes here and there.

PowerApps

There are three tools in the Business Applications Group currently: Power BI, Flow, and PowerApps. PowerApps is an Office 365 feature that allows you to pretty easily build line-of-business applications with low code or no code. There are lots of possibilities for integration between these three products. For instance, you can display a Power BI report in a PowerApps app, or you can display a PowerApps input screen within a Power BI dashboard, or you can have a Power BI alert trigger a Flow which causes something else to happen in a workflow. 

AppSource

AppSource is like a marketplace to search for line-of-business applications for Power BI, Office 365, Dynamics 365, as well as other products and services. Published offerings can be specific to your organization (such as a Power BI App discussed above), from third parties (like Salesforce), or from partner companies (such as my employer, BlueGranite). 

Azure Active Directory App

If you intend to use Power BI Embedded, you'll need to register an AAD App. This AAD App will be used as the master account (like a service account). This account will have a Power BI license and will have permissions assigned to it to manage the Power BI tenance via the Power BI REST APIs. 

You Might Also Like...

Checklist for Finalizing a Data Model in Power BI Desktop

Why the Default Summarization Property in Power BI is So Important

Including File Properties and Metadata in a U-SQL Script

When working on big data systems, it can be very helpful to include file properties and other metadata directly within the data results. Capturing data lineage can come in very handy, especially if reconciling or troubleshooting issues (for instance, if retry logic occurred in the data stream and now you have duplicate rows to be handled).

I just learned we have some new U-SQL syntax which supports the following file properties:

  • URI (uniform resource identifier)
  • Modified date
  • Created date
  • Length (file size in bytes)

In the following example, I'm using U-SQL (Azure Data Lake Analytics) to iterate over files which are in date-partitioned subfolders under Raw Data within Azure Data Lake Store. As part of the schema-on-read definition of the source files (aka the extract statement), the new file properties are shown in yellow:

U-SQL_File_Properties.jpg

The output for the virtual columns looks like this:

U-SQL_File_Properties_Output.jpg

You can find more info about this in the release notes on GitHub

Like This Content?

If you are integrating data between Azure services, you might be interested in an all-day session Meagan Longoria and I are presenting at PASS Summit in November. It's called "Designing Modern Data and Analytics Solutions in Azure." Check out info here: http://www.pass.org/summit/2018/Sessions/Details.aspx?sid=78885 

You Might Also Like...

Querying Data in Azure Data Lake Store with Power BI

Granting Permissions in Azure Data Lake

Zones in a Data Lake

Two Ways to Approach Federated Queries with U-SQL and ADLA