Including File Properties and Metadata in a U-SQL Script

When working on big data systems, it can be very helpful to include file properties and other metadata directly within the data results. Capturing data lineage can come in very handy, especially if reconciling or troubleshooting issues (for instance, if retry logic occurred in the data stream and now you have duplicate rows to be handled).

I just learned we have some new U-SQL syntax which supports the following file properties:

  • URI (uniform resource identifier)
  • Modified date
  • Created date
  • Length (file size in bytes)

In the following example, I'm using U-SQL (Azure Data Lake Analytics) to iterate over files which are in date-partitioned subfolders under Raw Data within Azure Data Lake Store. As part of the schema-on-read definition of the source files (aka the extract statement), the new file properties are shown in yellow:

U-SQL_File_Properties.jpg

The output for the virtual columns looks like this:

U-SQL_File_Properties_Output.jpg

You can find more info about this in the release notes on GitHub

Like This Content?

If you are integrating data between Azure services, you might be interested in an all-day session Meagan Longoria and I are presenting at PASS Summit in November. It's called "Designing Modern Data and Analytics Solutions in Azure." Check out info here: http://www.pass.org/summit/2018/Sessions/Details.aspx?sid=78885 

You Might Also Like...

Querying Data in Azure Data Lake Store with Power BI

Granting Permissions in Azure Data Lake

Zones in a Data Lake

Two Ways to Approach Federated Queries with U-SQL and ADLA