Handling Row Headers in U-SQL

This is a quick tip about syntax for handling row headers in U-SQL, the data processing language of Azure Data Lake Analytics. There are two components: handling row headers on the source data which is being queried, and row headers on the dataset being generated by ADLA.

Detecting that row headers are present on the first row of the source data:

USING Extractors.Csv(skipFirstNRows:1)

Outputting row headers on row 1 of the dataset being generated:

USING Outputters.Csv(outputHeader:true);

Here is a full U-SQL example which includes both:

DECLARE @inputPath string = "/RawData/{date:yyyy}/{date:MM}/{filename:*}.csv";
DECLARE @outputPath string = "/CuratedData/POC.csv";

@data = 
InstanceID string
,TransactionID string
,TimestampUtc string
,ClassID string
,ClassName string
,CurrentValue string
,date DateTime //virtual column
,filename string //virtual column 
FROM @inputPath 
USING Extractors.Csv(skipFirstNRows:1);

@result =
 InstanceID AS InstanceID
,TransactionID AS TransactionID
,TimestampUtc AS TimestampUTC
,ClassID + "-" + ClassName AS Class
,CurrentValue AS CurrentValue
,date AS TransactionDate
,filename AS SourceFileName
,1 AS NbrOfTransactions
FROM @data
WHERE ClassName == "EX1";

OUTPUT @result 
TO @outputPath 
USING Outputters.Csv(outputHeader:true,quoting:true);

As a reminder: U-SQL is a batch-oriented language which requires its output to be written to a destination file. It's not intended to be an ad hoc query language at the time of this writing.

You Might Also Like...

Data Lake Use Cases and Planning Considerations

PowerShell for Assigning and Querying Tags in Azure

Tags in Azure are useful pieces of metadata for documenting (annotating) things such as:

  • Billing or cost center categories (ex: general ledger code)
  • Environment names (ex: Dev, Test, Prod, Sandbox)
  • Project or system
  • Purpose or application
  • Team, group, department, or business unit
  • Who owns or supports the resource
  • Release or version numbers (ex: for testing infrastructure)
  • Archival date (ex: if infrastructure is only needed temporarily)
  • Who initially created the resource
  • Which customer a resource applies to (ex: for an ISV)
  • Service level agreement
  • Patching or maintenance window
  • etc…

Tags are free-form key/value pairs. So, they can be used for tracking anything you find to be helpful. Tags are particularly helpful for breaking down invoicing costs. For instance, rather than seeing the entire cost for certain resources (like VMs or storage) in a resource group, tags allow you to subdivide the resource costs further, or to group costs in another way across resource groups. Here's what the tags look like when you download usage (the new V2 format) for your subscription:

You can assign tags for resource groups, as well as individual resources which support Azure Resource Manager. The individual resources do not automatically inherit tags from the resource group parent. A maximum of 15 key/value pairs can be assigned (though you could store concatenated values or embedded JSON in a single tag value as a workaround). You may want to just assign tags at just the resource group level, and use custom queries to "inherit" at the resource level. Alternatively, you may want to assign tags to the individual resources directly particularly if you want to see them clearly on the standard "download usage" report of billing.

Since the key/value pairs are just free-form text, watch out for uniformity issues. To improve consistency, you can utilize policies to require tags and/or apply defaults if you'd like (for example, you might want to enforce a "Created By" tag). Tags can be set in the ARM template when you initially deploy a resource (which is best so that no billing occurs without proper tagging), or afterwards to existing resources via the portal, PowerShell, or CLI.

The three tags I'm currently using in an implementation are Billing Category, Environment Type, and Support Contact:


The above screen shot shows setting tags within the portal. If you have more than a handful of resources, that won't be efficient at all. Following are a few PowerShell scripts to help with that setting tags.

Assign Tags to a Resource

This script will *overwrite* any and all tags previously assigned to one resource.

$resourceGroupName = 'InternalReportingRGDev'
$resourceName = 'bisqlvm1datastdstrgdev'

$azureResourceInfo = Find-AzureRmResource -ResourceGroupNameEquals $resourceGroupName -ResourceNameEquals $resourceName 

Set-AzureRmResource -Tag @{ billingCategory="Internal Analytics"; supportContact="Analytics Team"; environmentType="Dev" } -ResourceName $resourceName -ResourceType $azureResourceInfo.ResourceType -ResourceGroupName $resourceGroupName -Force 

Assign Tags to a Resource Group

This script will *overwrite* any and all tags previously assigned to one resource group.

$resourceGroupName = 'InternalReportingRGDev'

$azureRGInfo = Get-AzureRmResourceGroup -Name $resourceGroupName

Set-AzureRmResourceGroup -Id $azureRGInfo.ResourceId -Tag @{ billingCategory="Internal Analytics"; supportContact="Analytics Team"; environmentType="Dev" } 

Assign Tags to All Resources Within a Resource Group (Inherited from the RG)

This script will *overwrite* any and all tags previously assigned to one resource.

$resourceGroupName = 'InternalReportingRGDev'

$azureRGInfo = Get-AzureRmResourceGroup -Name $resourceGroupName
foreach ($item in $azureRGInfo) 
Find-AzureRmResource -ResourceGroupNameEquals $item.ResourceGroupName | ForEach-Object {Set-AzureRmResource -ResourceId $PSItem.ResourceId -Tag $item.Tags -Force } 

Add an Additional Tag to a Resource Group

This script adds a new tag and preserves existing tags for one resource group. It only accepts new tags (i.e., it will error out if you repeat existing tags).

$resourceGroupName = 'InternalReportingRGDev'

$azureRGTags = (Get-AzureRmResourceGroup -Name $resourceGroupName).Tags

$azureRGTags+= @{ billingCategory345="Internal Analytics" }

Set-AzureRmResourceGroup -Tag $azureRGTags-Name $resourceGroupName 


Query to Get List of Tag Names & Values for a Specific Resource

This checks for the tags assigned to one resource.

$resourceGroupName = 'InternalReportingRGDev'
$resourceName = 'bisqlvm1datastdstrgdev'

(Find-AzureRmResource -ResourceGroupNameEquals $resourceGroupName -ResourceNameEquals $resourceName).Tags

Output is a list of each Name/Value pair which has been assigned:


Query to Get List of Resource Groups With a Specific Tag Value Assigned

(Find-AzureRmResourceGroup -Tag @{ billingCategory="Internal Analytics" }).Name 

Output is a list of resource groups which have been assigned that tag name and value.


Query to Get List of Resources With a Specific Tag Value Assigned

(Find-AzureRmResource -Tag @{ environmentType="Dev" }).Name 

Output is a list of resources which have been assigned that tag name and value.


Query to Get List of Resources With a Tag Set Based on Tag Name

(Find-AzureRmResource -TagName 'billingCategory').Name 

Output is a list of which resources have a specific tag assigned (regardless of the tag's value).


You Might Also Like...

Naming Conventions in Azure

Setting Up Disk Encryption for a Virtual Machine with PowerShell

Naming Conventions in Azure

I must admit right up front: I'm more than a little obsessed with naming conventions. Prefixes...suffixes...I really enjoy coming up with the optimal convention to use so that a name is at least somewhat self-documenting without being horribly long.

In Azure, we use a combination of resource groups and resource names to organize things. Here's the naming convention for resources that I currently prefer:

Purpose --> Type of Service --> Environment

Resource Group examples:




Virtual Machine examples:

BISQLVM1Dev (this one runs the engine, SSIS, and MDS)

BISQLVM2Dev (this one runs SSAS in multidimensional mode)





Storage Account examples (non-managed storage):

BISQLVM1DataStrgDev (this is the unit of recovery for a single VM)


BISQLVM1BckStrgDev (SQL Server backups; sent to geo-redundant storage)


BISQLVM1DiagStrgDev (diagnostic & logging data)


Additional criteria:

  • Some resources require a unique name across all of Azure. This is the case if the resource has a data access endpoint or URI. Therefore, depending on your implementation, you might need to auto-generate part of the name to enforce uniqueness. In that case, I would still try to make the prefix human-understandable, followed by the uniqueString().
  • The type of service in the name helps with logging/metrics in monitoring scenarios. You probably want to use a standard abbreviation (in the examples above, RG is for resource group; VM is for virtual machine; and Strg is for storage).
  • Environment (such as Dev/Test/Prod) as the suffix makes concatenations easier within scripts. You may also want to use tags to denote the environment.
  • We do denote Prod in the name of Production resources (rather than omitting it completely). This is because our Dev/Test/Prod resources are all contained within one subscription, separated by resource group. Therefore, we want to be very specific.
  • No dashes (hyphens) or underscores in the names since all resources don't allow them.
  • Camel case if the resource allows it; otherwise all lower case. (Storage accounts are actually required to be all lower case - the storage example shown above is camel case only because it's easier to read in this blog post.) You might decide to do the opposite: always go lower case so things are consistent. There's some appeal in that too.
  • The maximum length allowed varies quite a bit between resources.
  • Depending on how many teams/groups in your organization share a subscription, you might want to include a prefix of who owns or maintains this resource. This can also be done with a tag, but this sort of naming convention is helpful for sorting resources especially if you have wide permissions to see the entire subscription.
  • There are times when you do need the names to be the consistent across all environments, or you introduce too much complexity. One example of this is Azure SQL Database (its parent server differs between Dev, Test, and Prod but not the database name itself). Another example is for Azure Data Factory - to prevent maintaining duplicate JSON code multiple times I'm finding it best to keep the name of linked services, datasets, and pipelines the same name across data factories (whereas the name of Azure Data Factory resource would differ between Dev, Test, and Prod).
  • In addition to Dev/Test/Prod, we also use a 'Sandbox' as a suffix which essentially means: this is an individual's area for learning or exploration. For instance: DataScienceVMSandbox.
  • Unless you're supporting a multi-tenant platform, putting the organization/company name within the name doesn't add any valuable context.

In the end, the actual naming convention you decide on isn't the important thing. Most important is that you can locate resources easily and secure them properly -- which requires some consistency with the naming policy.

You Might Also Like...

Displaying Columns of Metadata in the Azure Portal

Why Some Azure VM Sizes are Unavailable When Resizing in the Portal

New Whitepaper on Planning a Power BI Enterprise Deployment

I'm excited to share that a new technical whitepaper I co-authored with Chris Webb is published. It's called Planning a Power BI Enterprise Deployment. It was really a fun experience to write something a bit more formal than blog posts. My interest in Power BI lies in how to successfully deploy it, manage it, and what the end-to-end story is especially from the perspective of integration with other data assets in an organization. Power BI has grown to be a huge, wide set of features so we got a little verbose at just over 100 pages.

A huge thank you to Chris Webb for inviting me to be his co-author. Chris is not only whip-smart, but a total pleasure to work with. 

Another big thank you to Meagan Longoria for being our tech editor. I like to think of myself as detail-oriented, but I've got nothin' compared to her eagle eye.

We worked primarily with Adam Wilson at Microsoft in terms of getting information, so he deserves a thank you as well for dealing with the questions that Chris and I peppered him with week after week.

I hope you find the whitepaper to to be useful.