Handling Row Headers in U-SQL

This is a quick tip about syntax for handling row headers in U-SQL, the data processing language of Azure Data Lake Analytics. There are two components: handling row headers on the source data which is being queried, and row headers on the dataset being generated by ADLA.

Detecting that row headers are present on the first row of the source data:

USING Extractors.Csv(skipFirstNRows:1)
 

Outputting row headers on row 1 of the dataset being generated:

USING Outputters.Csv(outputHeader:true);

   
Here is a full U-SQL example which includes both:

DECLARE @inputPath string = "/RawData/{date:yyyy}/{date:MM}/{filename:*}.csv";
DECLARE @outputPath string = "/CuratedData/POC.csv";

@data = 
EXTRACT
InstanceID string
,TransactionID string
,TimestampUtc string
,ClassID string
,ClassName string
,CurrentValue string
,date DateTime //virtual column
,filename string //virtual column 
FROM @inputPath 
USING Extractors.Csv(skipFirstNRows:1);

@result =
SELECT 
 InstanceID AS InstanceID
,TransactionID AS TransactionID
,TimestampUtc AS TimestampUTC
,ClassID + "-" + ClassName AS Class
,CurrentValue AS CurrentValue
,date AS TransactionDate
,filename AS SourceFileName
,1 AS NbrOfTransactions
FROM @data
WHERE ClassName == "EX1";

OUTPUT @result 
TO @outputPath 
USING Outputters.Csv(outputHeader:true,quoting:true);


As a reminder: U-SQL is a batch-oriented language which requires its output to be written to a destination file. It's not intended to be an ad hoc query language at the time of this writing.

Want to Know More?

My next all-day workshop on Architecting a Data Lake is in Raleigh, NC on April 13, 2018

You Might Also Like...

Data Lake Use Cases and Planning Considerations

Querying Multi-Structured JSON Files with U-SQL in Azure Data Lake