Controlling Data Access in Azure for Administrators and Owners

ResourceGroup.jpg

Recently a customer expressed concern that an owner of an Azure resource group automatically gains access to the data within the services contained in the resource group. In this case, the customer was specifically referring to data in Azure Data Lake Storage Gen 1 but this concept applies to Azure Storage and other data-oriented services in Azure as well. The customer’s comment prompted me to look into available alternatives. This is by no means a detailed security post…rather, I’m trying to share a few nuggets of what I learned.

Key points to take away from this post:

  • The built-in Owner role is, by design, unrestricted for both management operations as well as data operations.

  • You cannot 'break' the inheritance model of resources in Azure.

  • If resource group owners are not allowed to see data, these are the following choices I’m aware of currently:

  1. Remove owner permissions from the resource group level, or

  2. Isolate the resource into its own resource group so it can be managed separately, or

  3. Investigate using the new "DataActions" and "NotDataActions" properties within RBAC role structure to separate management operations from data operations (currently in preview at the time of this writing)

Default Behavior of an Owner in a Resource Group Allows Access to Data

Let's first cover what happens by default. Say I have a resource group with a way cool name of "SecurityTestRG" and I've granted "Wayne Writer" ownership permissions to the resource group. And within the resource group I have a storage account called "strgtest96." As we'd expect, the owner permissions are inherited by the storage account:

Security_ResourceInherited.jpg

And with that, Wayne Writer is able to view, edit, and delete any of the data that has been uploaded to that storage account by virtue of his owner permissions at the resource group level:

Security_OwnerOfResourceGroup.jpg

Allowing owners unrestricted access to both the management plane and the data plane is by design. However, if you have a group of people who should be able to administer (own) everything contained in the resource group, but you don’t want them to automatically see all the data, what are the options?

Cannot Break the RBAC Inheritance Model in Azure

The first thing you might be inclined to try is to just remove the owner permission for the one resource within the resource group that has data, so that you can specify security for it differently. However, that doesn’t work:

Security_RoleAssignmentsError.jpg

When you try to delete a role assignment that’s been inherited, as shown above, you get an error message: “Inherited role assignments cannot be removed. Please open up the associated resource and remove the role assignments from there.

In Azure’s RBAC model, we can add additional permissions at lower levels (i.e., like for a resource itself within the resource group), but we cannot remove an assignment that’s been inherited.

An additional note about subscription level owners & administrators: Any owner you specify at the subscription level will be inherited by all resource groups and resources across the subscription—it’s a very high privilege role. The same inheritance behavior also applies to the Azure service administrator (but not the other co-administrators—they would have to be specified as subscription owners).

Options for Handling Security Differently

Option 1. You could remove the owner permission from the resource group (that is the suggestion presented within the error message shown above). However, let’s say there are 10 resources in this resource group and the other 9 of them can inherit the owner permission without issue. In that case, do we really want to have to manage security for every single resource separately? That might introduce risk and inconsistency. Though if there aren’t very many resources within a resource group, that might work ok.

Option 2. You could isolate the resource into its own ‘sensitive’ resource group so it can be managed separately. This allows the other 9 resources to stay in the main resource group with normal RBAC, and just segregates this one resource. Depending on how you handle automation and deployments, having separate resource groups could add complications (because the general rule for resource groups is to group resources together that are related & have the same lifecycle). Also, it still precludes you from being able to own the resource yet not see data—this segregation would work if contributor or a custom role meets your needs.

Option 3. There is new functionality in preview that looks promising which will allow us to handle security for management operations and data operations separately, as shown here:

Making the above scenario work would involve using the "DataActions" and "NotDataActions" properties within RBAC role structure to separate management operations from data operations. More info on using option 3 (still in preview at the time of this post) can be found here: https://docs.microsoft.com/en-us/azure/role-based-access-control/role-definitions. As of now, this capability looks to only be available via ARM template & not via the user interface in the portal. I’m guessing this security enhancement is inspired by GDPR and/or customer needs for more granularity of permissions.

Finding More Information

Azure Storage Security Guide <—This is a good read re: management plane and data plane

What is Role-Based Access Control (RBAC)?

Understand Role Definitions

Like This Content?

If you’re getting started building data-oriented solutions in Azure, you might be interested in attending one of the sessions Meagan Longoria and I are giving soon:

Designing Azure Data and Analytics Solutions - SQLTrail - Richmond, VA - Oct 12, 2018

Designing Modern Data and Analytics Solutions in Azure - PASS Summit - Seattle, WA - Nov 5, 2018

Terminology Check - What are Data Flows?

It's another terminology post! Earlier this week I was having a delightful lunch with Angela Henry, Kevin Feasel, Javier Guillen, and Jason Thomas. We were chatting about various new things. Partway thru our conversation Jason stops me because he thought I was talking about Power BI Dataflows when I was really talking about Azure Data Factory Data Flows. It was kind of a funny moment actually but it did illustrate that we have some overlapping terminology coming into our world.

So, with that inspiration, let's have a chat about some of the new data flow capabilities in the Microsoft world, shall we?

Azure Data Factory Data Flow

The new Azure Data Factory (ADF) Data Flow capability is analogous to those from SSIS: a data flow allows you to build data transformation logic using a graphical interface. A really interesting aspect about ADF Data Flows is that they use Azure Databricks as the runtime engine underneath -- however, you don't actually have to know Spark or Databricks in order to be able to use ADF Data Flows. The goal is for it to be a low code/no code way to transform data at scale.

Follow Mark Kromer and the ADF team on Twitter to stay up to date on the rollout of the preview.

 

More info on ADF Data Flow can be found here: https://aka.ms/adfdataflowdocs

Power BI Dataflows

Power BI Dataflows (yes, this one is branded as one word) are a new type of object in a Power BI Workspace which will allow you to load data into a Common Data Model. Data is loaded via a web-based version of Power Query, which is why this capability is referred to as self-service data prep. The resulting data is stored in Azure Data Lake Storage Gen 2. Once in the Common Data Model in the data lake, it can be reused among various Power BI datasets -- allowing the data load, transformations, and cleansing to be done once rather than by numerous PBIX files. This capability was known for a little while during the private preview as Power BI Datapools or as 'Common Data Service for Analytics' (CDS-A) -- but the final name looks like it's going to be Power BI Dataflows.

It's still early so there's not a lot of info available online yet. James Serra wrote up a nice summary and has a few links on his blog. Also, here's a diagram that Chris and I included in the recently updated whitepaper Planning a Power BI Enterprise Deployment which shows our initial understanding of the Power BI Dataflows capability:

Note that Pro users can use Power BI Dataflows without requiring Premium. However, my hunch is that this capabililty will be most appealing for data at scale - i.e., the features that Premium offers with respect to Power BI Dataflows will be pretty compelling, which is why Premium is depicted in the diagram above.

SSIS Data Flow

Data flows have long been a key part of SQL Server Integration Services (SSIS) for data transformations, just like the new capability being added to ADF discussed above. As of Azure Data Factory V2, we can also host and execute SSIS packages in Azure from ADF V2. 

Microsoft Flow

Just for completeness I'll cover one more product which is similarly named. Flow is an Office 365 service for workflow automation between services. It can be used in conjunction with PowerApps and Power BI for different types of workflow automation. Flow lets you do things like approval requests, sending an e-mail alert, or creating a task in a project management system.

Now you know there are multiple types data flows being launched into the world of Microsoft BI (in addition to the good old SSIS data flows we've had forever). Now you can cleverly watch out for which one is being bantered about in your techie conversations. 

You Might Also Like...

Terminology Check - What is a Power BI App?

 

Lesson Learned - Keep PowerShell Modules Consistent and Up To Date

PowerShellLogo.jpg

This is a quick post to share something that happened on a project recently. We began to experience some intermittent issues with Azure Data Factory (V1) and it was ultimately related to an out-of-date AzureRM PowerShell module. What does ADF have to do with the AzureRM PowerShell module you ask? For this project, the nightly loads are triggered by a signal file which indicates the source data is ready. Therefore, we have a PowerShell script that controls the whole end-to-end process (which overrides the ADF V1 built-in scheduling). The PowerShell script looks for the signal file and then proceeds to resume the Azure Data Factory pipelines (via the Resume-AzureRmDataFactoryPipeline cmdlet). The resume causes each pipeline to execute immediately. When all pipelines are finished, they all get suspended again (via the Suspend-AzureRmDataFactoryPipeline cmdlet) until the next execution of the data load process. This PowerShell process runs on a virtual machine.

In production recently, we started seeing cases of ADF pipelines that wouldn't resume properly which resulted in some data not getting loaded properly. It was inconsistent and intermittent. With the exact same PowerShell script, we couldn't reproduce the issue in UAT. The issues were only occurring in production, but not with precise regularity. 

My colleague on the project, Terry Crist, did some investigation and found that the AzureRM module installed on UAT was much newer than what was installed on production. Sure enough, once the AzureRM module was updated in production everything began to run reliably again.

So, this served as a good reminder to ensure that (a) UAT and production environments should be running on the same PowerShell module version, and (b) preferably the latest version should be installed when possible. In environments which don't have a full-time DBA looking after this sort of thing it's good for developers to know to watch out for these types of issues too. 

You Might Also Like...

Find Pipelines Currently Running in Azure Data Factory with PowerShell

PowerShell for Assigning and Querying Tags in Azure

New eBook - Data Lakes in a Modern Data Architecture

This is a quick announcement of a new resource available published to the BlueGranite site.

If you're interested in data lakes, you might want to check out an updated ebook just published to the BlueGranite site. It's called "Data Lakes in a Modern Data Architecture." This ebook was originally published about 3 years ago by Chris Campbell. If you saw the original, you'll note we retained the same 'look and feel' but about 90% of the content has been updated.

DataLakeVsDataWarehouse.jpg
 

I wrote the updated content from a practical point of view, totally hype-free. The table of contents:

  • Modern Data Architecture
  • Business Needs Driving Data Architectures to Evolve and Adapt
  • Principles of a Modern Data Architecture
  • Data Lake + Data Warehouse: Complementary Solutions
  • Tips for Designing a Data Lake
  • Azure Technologies for Implementing a Data Lake
  • Considerations for a Successful Data Lake in the Cloud
  • Getting Started with a Data Lake

To download the ebook, BlueGranite will ask for you to register your information. That's common for premium content like this. We take a low-key approach to sales, so I can assure you that registration only means you'll receive notifications of new content that you may find interesting.

I'll be updating the ebook from time to time. For example, it already needs updating to reflect the new changes in Azure Data Lake Store Gen 2. 

At 23 pages, the ebook just begins to explore a lot of considerations. To see if we can help you dive in deeper (pun intended), please contact us. I hope you enjoy reading the ebook as much as I enjoyed writing it.