How to Create a Demo/Test Environment for Azure Data Catalog

Azure Data Catalog is a Software as a Service (SaaS) offering in Azure, part of the Cortana Intelligence Suite, for registering metadata about data sources. Check this post for an overview of Azure Data Catalog key features. (I'm a big fan of what Azure Data Catalog is trying to accomplish.)

There are a couple of particulars about Azure Data Catalog which make it a bit more difficult to set up a Demo/Test/Learning type of environment, including:

  • You are required to sign into Azure Data Catalog with an organizational account. Signing in with a Microsoft account (formerly known as a Live account) won't work for Azure Data Catalog authentication, even if that's what you normally use for Azure.
  • One Azure Data Catalog may be created per organization. Note this is *not* per Azure subscription - if your account has access to multiple subscriptions, it's still one catalog per organization.

These restrictions are because the intention is for Azure Data Catalog to be an enterprise-wide sole system of registry for enterprise data sources.

Summary: Creating a Demo/Test Environment for Azure Data Catalog

Because of the above two restrictions, we need to create a Demo/Test/Learning sort of environment in a particular way. For the remainder of this post, the objective is to create a Data Catalog outside of your normal organizational Azure environment - i.e., associated to an MSDN account for instance. 

With some very helpful advice from Matthew Roche (from the Azure Data Catalog product team at Microsoft), the best method currently to create a Data Catalog test environment is as follows:

  1. Sign into the Azure portal with a Microsoft account (not with your organizational account). You should be the administrator of this subscription. For instance, my subscription is associated with my MSDN.
  2. In your Azure Active Directory (AAD), create a new AAD account. This cannot be associated to a Microsoft account; it needs to be a native AAD account. A native AAD account is recognized by the Data Catalog service as an organizational account. 
  3. Allow this new AAD account to be co-administrator of the subscription. This will permit the AAD account to provision the new Azure Data Catalog service.
  4. Go to the Azure Data Catalog portal at www.azuredatacatalog.com and sign in with the new AAD account. Provision a new Azure Data Catalog from here. You will continue to do all of the work in Azure Data Catalog with this separate AAD ID (and additional AAD IDs if desired).

The objective of this is to leave the Azure Data Catalog in your 'real' organizational Azure subscription free of test or temporary use data sources - i.e., you wouldn't want users in your environment to discover something like an AdventureWorks sample database in the catalog (loophole: if you are paying for the standard version of Azure Data Catalog, rather than the free version, you do get security capabilities and could restrict a data source to just yourself so others can't find it).

Sidenote: One additional important thing to be aware of with Azure Data Catalog is that a data source may be registered in the catalog only once. This is to prevent duplicates which could be really confusing to users of the system.

Below are further details about how to make this approach work.

Details: Creating a Demo/Test Environment for Azure Data Catalog

Step 1: Sign into Azure portal for which you are an administrator.

First, sign into the Azure portal with your Microsoft account (such as user@outlook.com). As of the time of this writing (April 2016), Azure Active Directory is still managed in the old portal not in the new portal yet. The old portal can be found at https://manage.windowsazure.com/

For our demo/test purposes, this should not be your organizational account (such as user@companyname.com). And of course, you need administrative privileges for the Azure subscription.

Step 2: Create a Native Azure Active Directory Account.

Go to the Active Directory menu, then select your default directory:

On the Users page, select Add User at the bottom:

Create a new user with the name you prefer:

 
 
 

Be sure to jot down the temporary password assigned by Azure.

At this point, you should see your new user on the AAD Users page. The key to making this all work is the account is sourced from 'Microsoft Azure Active Directory' and is *not* a Microsoft Account. (An account sourced from your organization's Active Directory works too...but we're trying to create a demo outside of the organizational Azure tenant.)

Next let's reset that temporary password now. 

Open up an InPrivate or Incognito browser window and go to https://login.microsoftonline.com/. By using InPrivate or Incognito, the login screen will reliably accept any type of account (otherwise it makes assumptions based on the type of account you're logged onto your machine with currently). You'll want to either use a different browser, or close the Azure portal, before this step so that it doesn't sign you in with the account you're logged into Azure with.

Sign in with the new AAD account we just created. When prompted, put in the current temporary password and reset to a new password. Close this browser window when finished resetting the password.

Login.jpg
 

Step 3: Provide Co-Administrator Permissions to the New AAD Account.

Back in the Azure portal (we're still using the old portal at https://manage.windowsazure.com/ since this functionality isn't yet exposed in the new portal). Here you sign in with your Microsoft account again (just like step 1). Go to the Settings menu, then the Administrators page, then click Add:

AzureAdministrator.jpg

Input the e-mail address of your new AAD user. You'll see a green check when it's validated.

AzureAdministrator2.jpg
 

At this point you should see your native AAD account listed on the Administrators page. Now we know that account has permission to create the Azure Data Catalog service. (I can be more liberal with this sort of setting because the Azure tenant I'm working in only contains demo data, not any real data.)

AzureAdministrator3.jpg

Go ahead and close the Azure browser window as we are finished with the Azure portal.

Step 4: Provision a New Azure Data Catalog from the Data Catalog Portal.

Now it's time to provision the Azure Data Catalog using our AAD account.

Launch a new browser window using InPrivate or Incognito (this will ensure you'll be able to seamlessly sign in with your AAD account) and go to the Azure Data Catalog portal at https://azuredatacatalog.com. When prompted, sign in with your AAD account. 

If everything with the AAD account is set up correctly, you should see a page which prompts you to create a new Azure Data Catalog:

CreateCatalog.jpg

Tip: Remember you can only have one catalog per organization, so be sure to give it a pretty broad name.

You can go ahead and add any other users and administrators for the Catalog as appropriate, provided they are not personal Microsoft accounts. 

Troubleshooting Access to Azure Data Catalog

This account does not have permission to access Azure Data Catalog

When signing into the Azure Data Catalog portal, the message "This account does not have permission to access Azure Data Catalog" is generated when you have signed in with an appropriate kind of account, and a catalog already exists somewhere, but your account doesn't have permission to access it. 

ADCPortal_RequestAccess.jpg

To figure out more info about the existing data catalog, first check if you see any catalogs in the new Azure portal at https://portal.azure.com/. If no catalogs are listed, this means the catalog resides in a subscription which you don't have permission to see in the Azure portal. Since there's one catalog per organization - if there's a subscription you cannot see, it's possible that's where the catalog is at. 

ADCPage.jpg

To get more information try to create a new catalog. You'll see a message "Only one Azure Data Catalog is supported per organization. A Catalog has already been created for your organization. You cannot add additional catalogs." Click the link under that message to "Access existing Azure Data Catalog." 

ADCPortal_RequestAccess2.jpg
 

Under User, you should be able to see the name of the catalog which may give you a hint as to which subscription it resides in. In any case, you need to talk to your Azure service administrator if this happens to determine if the catalog that is set up is really what/where you want it to be.

ADCPage2.jpg
 

You've logged in using a personal Microsoft account

When signing into the Azure Data Catalog portal, the message "You've logged in using a personal Microsoft account" is generated when you're not using an organizational account recognized by the Data Catalog service. Here's where you want to refer to the instructions above in this post to create a native Azure Active Directory (AAD) account to use for logging into Azure Data Catalog.

ADCPortal_PersonalAccount.jpg

Microsoft BI Integration for the Four Primary Report Types

We are currently experiencing a major evolution of the MSBI reporting toolset in accordance with the Microsoft BI Roadmap announced in October 2015. This diagram depicts current state the way I understand it.

(Updated as of mid-June 2017)

Also, here's a short video walking through the diagram:  http://www.blue-granite.com/blog/video-overview-of-microsoft-bi-reporting-tool-integration

  • Analytical reports:  produced in Excel
  • Interactive reports: produced in Power BI Desktop
  • Paginated reports: produced in Reporting Services (SSDT) or Report Builder
  • Mobile reports: produced in the Mobile Report Publisher (previously Datazen)

The four primary report types are in the process of becoming more deeply integrated into:

  • The Power BI Service (a cloud service), and
  • Power BI Report Server (on-premises portal option - includes 3 report types)
  • Reporting Services portal (on-premises portal option - includes 2 report types; requires SQL Server 2016 in native mode). 

As you'll see on the above diagram, every report type isn't compatible everywhere yet though I expect we'll see that over time. As things evolve, I'll keep the diagram updated. Hope you find it useful.

Following is how each of the report types look in the SSRS portal and Power BI when this post was originally written. Since then, there have been quite a few changes (i.e., Power BI Reports can only be rendered in Power BI Report Server -- not in a traditional SSRS portal).

Should You Use a SQL Server Marketplace Image for an Azure Virtual Machine?

Virtual machines (VMs) in Azure are fantastic. They are considered "Infrastructure as a Service" (IaaS) and can implement production workloads, dev/test environments, and as learning/sandbox areas.  For instance, with SQL Server 2016 having so many new features, it's great to be able to fire up a VM quickly, stop it when I'm not using it, and delete it when I'm done with it.

As more and more customers are interested in moving some portion of their BI/analytics workloads to cloud services, one question that comes up occasionally is whether or not you should start with a marketplace image that has SQL Server already installed. So far I've noted a few key considerations for this decision:

  1. Do you want to pay for the SQL Server license as part of the VM pricing?
  2. Do you want to configure SQL Server in a specific way (i.e., following best practices)?
  3. Do you want Azure to handle things like automated patching by default?

What is an Azure Marketplace VM Image?

The Azure Marketplace is also sometimes called the Azure Gallery. It's an "online store" with a ton of predefined solutions including VM images to use as starting points. When you click the Add button to create a new virtual machine, you find there's a wide variety of predefined images to choose from:

 

For instance, if you do a search for SQL Server 2016, you'll see a few choices of images which have SQL Server pre-installed:

Following are a few considerations regarding if you want to utilize a pre-built image for SQL Server.

Pricing of Virtual Machine

Let's say you already own SQL Server licenses, potentially with Software Assurance. In this case you probably do not want to pay for SQL Server as part of the VM pricing. 

Below is example pricing for a small-ish A6 Windows machine (in reality, a D series is recommended for running SQL Server). Using a Windows image means you would install SQL Server yourself which would be considered a "bring your own license" type of situation for SQL Server:

 

Example of a SQL Server machine which includes a SQL Server Standard license:

 

Example of a SQL Server machine which includes a SQL Server Enterprise license:

 

You can find the pricing calculator here:  https://azure.microsoft.com/en-us/pricing/calculator/. Obviously the above screen shots show retail pricing, but it makes the point that there's a difference in pricing with a base Windows machine vs a SQL Server VM. Do keep in mind that you only pay the licensing costs for each individual hour that the VM is running. This Azure article discusses pricing considerations a bit further.

{Updated} Microsoft EA customers can "Bring Your Own License." Look for the SQL VMs that have BYOL in the name. More info is here: https://azure.microsoft.com/en-us/blog/easily-bring-your-sql-server-licenses-to-azure-vms.

Having said that, there's other reasons you may want to install SQL Server yourself anyway...

Setting Up SQL Server in an Azure VM Using Best Practices

If you are a DBA with specific ideas about how things should be set up (as well you should), then might lean towards a Windows image and install/configure SQL Server yourself (assuming you have a SQL license that is). With the most recent SQL Server images I've used (as of mid 2016--definitely verify yourself), there are two drives for data, logs, and TempDB. The image also installs all services, so you will want to disable startup or uninstall what you don't need. Also, the new options available in SQL Server 2016 setup (such as TempDBs) haven't made their way to Azure setup options yet. I'm a BI developer, not a DBA, so I'm not even going to try to enumerate what I think should or shouldn't change from the built-in SQL Server image, but the point is that you'll need to spend some time verifying and reconfiguring things to be the way you want them to be. 

Recommendations and documentation for how the SQL image is setup can be found here: Performance Best Practices for SQL Server in Azure Virtual Machines. There's lots of good info there.

Control Over Patching of SQL Server and Windows

The recent SQL Server images I've seen have automated patching by default, which includes both Windows and SQL Server. 

Automated patching and automated backup depend on an extension called the SQL Server IaaS Agent:

You can of course turn automated operations off, but it's worth noting that more automation takes place by default with the pre-built SQL Server images. Depending on your experience level and typical processes for your SQL Server machines, the automation may be a nice feature. Since automated patching makes testing more challenging, this is another decision for the DBA for how he/she wants to handle things for different types of SQL Server VMs.

Finding More Information

SQL Server on Azure Virtual Machines Overview

Provision a SQL Server Virtual Machine in the Azure Portal

Performance Best Practices for SQL Server in Azure Virtual Machines

Overview of SSAS Tabular in DirectQuery Mode for SQL Server 2016

How the queries passed from SSAS to the DB engine is very interesting. First, however, let's review the basics of DirectQuery mode in SQL Server Analysis Services (SSAS). All information is as of RC1 (release candidate 1) for SQL Server 2016.

Two Modes for SSAS Tabular Models

When developing an SSAS Tabular model, you can choose one of two options for handling the underlying data:

In-Memory Mode (aka Imported Mode).  Stores the data in the in-memory model, so all queries are satisfied by the data imported into the Tabular model's storage engine. This requires the model to be processed for updated data to become available for reporting. This mode is conceptually analogous to MOLAP in SSAS multidimensional models (though the SSAS architecture differs significantly).

DirectQuery Mode. Leaves the data in the source and sends the queries to the underlying database. In this case there's no processing and SSAS serves as a semantic model to improve the user experience. This mode is conceptually analogous to ROLAP in SSAS multidimensional models (though there are architectural / implementation differences between DirectQuery and ROLAP).

Note that the entire SSAS Tabular model is specified as one of the two modes.

Use Cases for SSAS Tabular in DirectQuery Mode

Most SSAS Tabular models do run in In-Memory mode. However, there are some specific situations where DirectQuery is a viable option.

Near Real-Time Reporting. Low latency for data availability is the primary use case for DirectQuery. Because there's no processing time associated with populating data in SSAS, there's less delay in making data available. SSAS only compiles and deploys metadata. In this situation, you'll want to plan for how to handle contention of read and write activity.

Large Datasets Which Exceed Memory Available. Another situation to consider DirectQuery is if your server doesn't have enough memory to store all of the data. The rule of thumb for storing data in a Tabular in-memory model is to have 2.5x memory, so if your in-memory model size is 50GB, you would require about 125GB of RAM on the server. Therefore, for large datasets that are difficult to fit in memory, DirectQuery can be appealing. In the absence of model processing, SSAS would have much less CPU and memory demands when running in DirectQuery mode (and conversely, the underlying DB engine needs to be beefier).

Relational Database Highly Tuned for Ad Hoc Queries. If you've invested time creating clustered columnstore indexes, partitions, or other query tuning efforts in the underlying database engine, you may be inclined to utilize the source database for live query activity. In that case, the value of SSAS is the semantic layer (i.e., if you have users connecting to the data source and developing their own reports, it is significantly easier for users to navigate an SSAS model where everything is organized and no joins are required).

Most or All Logic Comes from Underlying Database. Only the most straightforward DAX calculations will be able to convert to SQL. If you tend to pass most everything you need into SSAS from the source database, and your SSAS calculations are simple, you should be able to consider DirectQuery mode. 

Updates to SSAS Tabular in DirectQuery Mode in SQL Server 2016

Improvements to DirectQuery in SQL Server 2016:

  • Performance improvements related the queries that are generated. Queries are less verbose now.
  • Additional relational data sources are now supported. Now the options include including SQL Server, APS (PDW), Oracle, and Teradata.
  • Support for Row-Level Security in the underlying data source, which is a new feature in SQL Server 2016.
  • Calculated columns in the Tabular model are now permitted when in DirectQuery mode. The computation will be pushed down to the database server. 
  • Excel is now available as a tool to use with DirectQuery mode. Prior to SQL Server 2016, Excel couldn't be used since Excel passes MDX rather than DAX. Note that there's still some limitations which are listed here: https://msdn.microsoft.com/en-us/library/hh230898.aspx.
  • Conversion for some of the time intelligence calculations are now supported.

Other Changes to DirectQuery in SQL Server 2016:

  • Deployed mode (as seen in SSMS rather than SSDT) is now a more 'pure' DirectQuery mode. The options for 'DirectQuerywithInMemory' and 'InMemorywithDirectQuery' appear to be gone. 
  • Sample partitions can be created to be able to see some data inside of the SSDT development environment.

Limitations of DirectQuery in SQL Server 2016:

  • In SSAS Tabular, an entire model is either set to Import (In-Memory) or DirectQuery. This is different from SSAS Multidimensional which allows partitions & dimensions to be set individually. 
  • A single data source is permitted. DirectQuery will not issue cross-database queries.
  • Calculated tables are not supported when the model is in DirectQuery mode. The workaround for this is to import a table multiple times (which is what is typically done to handle role-playing dimensions in a Tabular model).
  • Not all DAX functions can be translated to SQL syntax, so there are some DAX restrictions. More info is here: https://msdn.microsoft.com/en-us/library/hh213006.aspx.

To take advantage of all of the new improvements, the model must be set to compatibility level 1200.

 

Finding More Information

MSDN - DirectQuery Mode (SSAS Tabular)

Technical White Paper - Using DirectQuery in the Tabular BI Semantic Model