Assigning Data Permissions for Azure Data Lake Store (Part 3)

This is part 3 in a short series on Azure Data Lake permissions. 

Part 1 - Granting Permissions in Azure Data Lake
Part 2 - Assigning Resource Management Permissions for Azure Data Lake Store
Part 3 - Assigning Data Permissions for Azure Data Lake Store {you are here}

In this section, we're covering the "data permissions" for Azure Data Lake Store (ADLS). The ACL (access control list) grants permissions to to create, read, and/or modify files and folders stored in the ADLS service. Uploading and downloading data falls in this category of ACLs. If you come from the Unix or Linux world, the POSIX-style ACLs will be a familiar concept. 

There are two types of ACLs: Access ACLs and Default ACLs.

An Access ACL is the read/write/execute permissions specified for a folder or file. Every single folder or file has its security explicitly defined -- so that means the ADLS security model is not an 'inheritance' model. That is an important concept to remember.

A Default ACL is like a 'template' setting at a folder level (the concept of a default doesn't apply at the file level). Any new child item placed in that folder will automatically obtain that default security setting. The default ACLs are absolutely critical, given that data permissions aren't an inheritance model. You want to avoid a situation where a user has permission to read a folder, but is unable to see any of the files within the folder -- that situation will happen if a new file gets added to a folder which has an access ACL set at the folder level, but not a default ACL to apply to new child objects.

Tips for Assigning Data Permissions for ADLS

Organize your data lake folders and files so that folder-level security is one of the main considerations, and so that security is easier to manage.

Access to raw data is typically highly limited. This is partially due to lack of usability, and partially to ingest data as quickly as possible. Because every single file in ADLS has security properties specified, that is one of several reasons why a large number of very tiny files in ADLS is generally discouraged.

Typically in a data lake, the majority of users need only read+execute rights to consume the data. However, you may also have an area like a 'manual drop zone' or an 'analytics sandbox' where select users have write permissions to create, modify & delete folders and files. Generally speaking, write permissions in a data lake are minimal.

Be selective with granting permissions at the root level ("/"). It does minimize maintenance if you define an access ACL + default ACL at the root level, but only if you feel confident that is adequate security.

Try to use Azure Active Directory (AAD) groups whenever you can to grant access, rather than individual user accounts. This is a consistent best practice for managing security across many types of systems. This will reduce maintenance, and reduce the risk of inaccurate or out of date user permissions.

Currently the maximum # of ACLs that can be assigned to a file or folder is 32. This is another big reason to use AAD groups for managing access, rather than individual users.

Try to assign access at the folder level whenever you can. Although ADLS doesn't have a true inheritance model, you can set a 'default' entry which will handle new child items. 

Be aware that changing a default ACL on a folder doesn't automatically propagate to change the default ACL on any existing child folders. So, managing changes to existing data needs to be done carefully. Although it can seem like the default ACLs act like an inheritance model in some respects, it definitely is not.

Grant access to an AAD application (aka service principal identity or SPI) for automated operations, such as data loads. For service principals, you often can assign just the data permissions (the ACL) and not any permissions to the ADLS service (the RBAC). Check Part 4 for more discussion about service principals.

You almost always want to assign read + execute permissions together. The execute permissions allow a user to traverse the folder structure to where a file resides, which is needed in conjunction with the read (or write) permissions for the actual file.

The portal interface makes it easy to apply permissions to existing child-level folders and files. It's is a little harder to specify via scripting methods since your script will need to explicitly be set up to handle recursive operations. Therefore, try to assign relevant permissions as early as possible in your design/development/operationalization phase. 

When you set permissions on existing data, it can take a little while if you are asking it to recursively traverse the folders and files to set permissions for every object. This is another reason to try to set permissions at the AAD group level, rather than via individual users. 

The PowerShell cmdlets to manage ADLS changed in January 2018. See this post: Breaking changes to Azure Data Lake Store cmdlets

Defining ADLS Data Permissions in the Azure Portal

In my example, I want to assign read + execute permissions for the StandardizedData folder, but not for the RawData folder. In the portal, I open Data Explorer, navigate to the applicable folder which sets the "scope" for the permissions, then the Access button:

ADLS_ACL_Portal_1.jpg

Click the Add button to select a user or a group. Notice the permissions are read/write/execute. You can have the new permission entry add to all existing child folders & files (which you typically want to do). The last radio button is really important - this lets you set it as both an access entry *and* the default entry.

ADLS_ACL_Portal_2.jpg

Important! When using the web interface as shown above, you need to leave the blade open while it assigns permissions. If you navigate away and close it before it completes, the process will get interrupted.

Defining ADLS Data Permissions via PowerShell Script

The technique shown above in the portal is convenient for quick changes, for learning, or for "one-off" scenarios. However, in an enterprise solution, or a production environment, it's a better practice to handle permissions via a script so you can do things such as:

  • Promote changes through different environments
  • Pass off scripts to an administrator to run in production
  • Include permission settings in source control

Group Permissions

In the following script, we are assigning read+execute permissions to a group:

  • Step 1 defines the access ACL.
  • Step 2 defines the default ACL. (Thanks to Saveen Reddy from the ADL team who very kindly clued me into needing to set the default in a separate step. I was a stumped on that one for a bit.)
  • Step 3 applies the folder-level access to the child objects. Note that it is only going one level deep in the folder structure. Therefore, you'll want to construct the script to be 'smarter' about recursion if you have a large number of folders which already exist, or just pass in an array list of the folders it should apply to - see Shannon Lowder's blog for an example of this technique. If you have any files, the default ACL in the foreach loop will fail (but it works fine for folders).
ADLS_ACL_PowerShell.jpg
 

Here's the copy/paste friendly script for the above screenshot - for a group:

#-----------------------------------------

#Input Area
$subscriptionName = 'YourSubscriptionName'
$resourceGroupName = 'YourResourceGroupName'
$resourceName = 'YourResourceName'
$adlsPath = '/Folder/Subfolder'
$groupName = 'YourAADGroupName'
$permissionType = 'ReadExecute'

#-----------------------------------------

#Manual login into Azure
#Login-AzureRmAccount -SubscriptionName $subscriptionName

#-----------------------------------------

#Step 1: Set the access permissions at the folder level
$groupId = Get-AzureRmADGroup -SearchString $groupName  
Set-AdlStoreItemAclEntry  `
    -AccountName $resourceName `
    -Path $adlsPath `
    -AceType Group `
    -Permissions $permissionType `
    -Id $groupId.Id 

#Step 2: Set the default at the folder level
Set-AdlStoreItemAclEntry  `
    -AccountName $resourceName `
    -Path $adlsPath `
    -AceType Group `
    -Permissions $permissionType `
    -Id $groupId.Id `
    -Default 

#Step 3: Set existing child objects to be the same as the folder level
$childObjects = Get-AzureRmDataLakeStoreChildItem `
    -AccountName $resourceName `
    -Path $adlsPath
$arrayOfObjectNames = @($childObjects.Name)
foreach ($objectName in $arrayOfObjectNames) 
    {
     Write-Host "Setting ACL for $adlsPath/$objectName"
     #Set the access
     Set-AdlStoreItemAclEntry  `
        -AccountName $resourceName `
        -Path "$adlsPath/$objectName" `
        -AceType Group `
        -Permissions $permissionType `
        -Id $groupId.Id 
     #Set the default
     Set-AdlStoreItemAclEntry `
        -AccountName $resourceName `
        -Path "$adlsPath/$objectName" `
        -AceType Group `
        -Permissions $permissionType `
        -Id $groupId.Id `
        -Default 
    }

User Permissions

This next script is nearly the same, but this time we are assigning read+execute permissions to a user instead of a group (which should be the exception not the rule):

ADLS_ACL_PowerShell_User.jpg
 

And, the copy/paste friendly script that goes with the above screenshot - for a user:

#-----------------------------------------

#Input Area
$subscriptionName = 'YourSubscriptionName'
$resourceGroupName = 'YourResourceGroupName'
$resourceName = 'YourResourceName'
$adlsPath = '/Folder/Subfolder'
$userName = 'UserNameInEmailFormat'
$userPermission = 'ReadExecute'

#-----------------------------------------

#Manual login into Azure
#Login-AzureRmAccount -SubscriptionName $subscriptionName

#-----------------------------------------

#Step 1: Set the access permissions at the folder level
$userId = Get-AzureRmADUser -UPN $userName 
Set-AdlStoreItemAclEntry  `
     -AccountName $resourceName `
     -Path $adlsPath `
     -AceType User `
     -Permissions $userPermission `
     -Id $userId.Id 

#Step 2: Set the default at the folder level
Set-AdlStoreItemAclEntry  `
     -AccountName $resourceName `
     -Path $adlsPath `
     -AceType User `
     -Permissions $userPermission `
     -Id $userId.Id -Default 

#Step 3: Set existing child objects to be the same as the folder level
$childObjects = Get-AzureRmDataLakeStoreChildItem `
     -AccountName $resourceName `
     -Path $adlsPath
$arrayOfObjectNames = @($childObjects.Name)
foreach ($objectName in $arrayOfObjectNames) 
    {
     Write-Host "Setting ACL for $adlsPath/$objectName"
     #Set the access
     Set-AdlStoreItemAclEntry  `
          -AccountName $resourceName `
          -Path "$adlsPath/$objectName" `
          -AceType User `
          -Permissions $userPermission `
          -Id $userId.Id 
     #Set the default
     Set-AdlStoreItemAclEntry  `
          -AccountName $resourceName `
          -Path "$adlsPath/$objectName" `
          -AceType User `
          -Permissions $userPermission `
          -Id $userId.Id `
          -Default 
    }

In both of the above scripts, if you are setting permissions for only child files (rather than folders), you can remove the second Set-AdlStoreItemAclEntry line which sets the default. There's no concept of a default at the file level.

Finding More Information

PowerShell Cmdlets for Azure Data Lake Store

Breaking Changes to Data Lake Store Cmdlets

Access Control in Azure Data Lake Store <--Definitely take time to read this

Secure Data in Azure Data Lake Store