Where Azure Analysis Services Fits Into BI & Analytics Architecture (Part 3)

December 30, 2016

This is part 3 of a discussion about a new service in Microsoft Azure: Azure Analysis Services. Azure AS is a Platform-as-a-Service (PaaS) offering which is in public preview mode (as of December 2016).

Part 1: Why a Semantic Layer Like Azure Analysis Services is Relevant

Part 2: Use Cases for Azure Analysis Services

Part 3: Where Azure Analysis Services Fits Into BI & Analytics Architecture {you are here}

From an infrastructure perspective, there are a variety of ways you can use Azure Analysis Services. Basically, you can think of it as a "mix & match" each of the 3 following options:

(1) Location of Assets

Hybrid. This is most common for companies which are extending their existing infrastructure.
All cloud-based. Utilization of cloud infrastructure, such as Azure services, is most common for brand new companies.
All on-premises. N/A for Azure AS - you'll want to use SQL Server Analysis Services instead of Azure Analysis Services for a fully on-premises implementation.

(2) Data Sources

From a single source such as a data warehouse. This is the most traditional path for BI development, and still has a very valid place in many BI/analytics deployments. This scenario puts the work of data integration on the ETL process into the data warehouse, which is the most appropriate place.
Directly from various systems. This can be done, but works well only in specific cases - it definitely won't work well if there are a lot of highly normalized tables, or if there's not a straightforward way to relate the disparate data together. Trying to go directly to the source systems & skip an intermediary data warehouse puts the "integration" burden on the data source view in Analysis Services, so plan for plenty of time testing if you're going to try this route (i.e., it can be much harder, not easier). Note that this option only makes sense if the data is stored in Analysis Services because it needs to be related together somehow (i.e., DirectQuery mode, discussed next in #3, with > 1 data source won't work if a user tries to combine data sources because the data is not inherently related).

There is one twist to the data source options, and that is the use of federated queries. See the last example at the end of this post for more on that.

(3) Data Storage

Stored (cached) in the AS data model. Data is stored in the in-memory model. In this case, we're using Analysis Services for its in-memory database as well as a semantic layer. This requires a scheduled refresh and provides the best performance.
DirectQuery. Data is *not* stored in Analysis Services; rather, AS is basically a semantic layer only wherein all queries are actually sent to underlying data source(s). This is useful when near real-time data is desired. However, unless your source system is tuned for it, performance may not be acceptable.

Note that the property to specify if the AS model is DirectQuery or not is associated with the .bim file. This means that the entire model is either DirectQuery or it's not (can't choose on a data source-by-data source basis or a table-by-table basis).

Below are some common scenarios (note not all possible combinations are actually depicted below, but enough to give you ideas).

In this first scenario, we have a traditional data warehouse which has integrated data from four different source systems. The data warehouse resides in an on-premises server, and the Analysis Services semantic layer resides in Azure. Data in Analysis Services is refreshed on a schedule. Reporting is primarily handled through the semantic layer to improve the user experience.

The data sources & Visual Studio pieces are removed from the rest of the examples to simplify (though they certainly still pertain).

The following depicts using Azure AS in DirectQuery mode back to the data warehouse. In this case, the DAX or MDX (whichever is passed from the client tool) is converted to SQL, sent to the data warehouse through the gateway. Data is then retrieved and sent back securely to the client tool.

Alternatively, the DirectQuery mode could be directed at a cloud-based data warehouse. Note in this scenario that a gateway is not necessary (because AS and the data warehouse both reside in Azure).

The next scenario varies only in that the data is stored in Analysis Services, and refreshed on a schedule.

Lastly, we have the concept of federated queries. PolyBase allows us to define an "external table" in SQL Server or Azure SQL Data Warehouse and reach into data stored in Azure Blob Storage (Azure Data Lake Store support is coming soon). These external tables are known as "schema on read" because the data isn't physically stored in the data warehouse.

In the following example, we are using Analysis Services in DirectQuery mode directed to the data warehouse. Under the covers, since an external table is involved, the user queries will actually reach back farther to get the Equipment Health Data as well, though the users don't have to know that's actually happening - though they do need to be willing to accept slower performance. Federated queries like this offer great flexibility to avoid or delay data integration for data analysis which is infrequent (if the data is frequently accessed, or has progressed beyond proof of concept, you likely want to implement full data integration).

Though I didn't depict every possible combination, hopefully this gives you a good idea of ways to use Analysis Services, and where it can fit into your existing architecture. As you can tell from Part 1 of this series, I'm a big fan of using a semantic layer for a consistent and friendly end-user experience.

You Might Also Like...

Overview of SQL Server Analysis Services Tabular in DirectQuery Mode for SQL Server 2016

Building Blocks of Cortana Intelligence Suite in Azure

Power BI Features End-to-End

Resolving 'Identity Not Found' Error When Provisioning Azure Analysis Services

December 3, 2016

This week I ran into an issue when trying to provision Azure Analysis Services (AAS) in Azure. The AAS team was fantastically helpful with helping me understand what was going on.

When provisioning Azure Analysis Services in the portal, the final selection is to specify who the administrator is. AAS requires the administrator to be an account (user or group) which exists in Azure Active Directory (AAD). Because this was in a sandbox area for testing purposes, I went ahead and used my personal account for the AAS administrator selection:

After making the selections in the portal, the provisioning failed. In looking at the details, the error was: The identity emailaddr@domain.com was not found in Azure Active Directory:

This error was puzzling because of course I'm in Azure Active Directory. Turns out the tenant I was using to provision the service (the Development tenant in the diagram below) really only contains guest accounts:

The Analysis Services product team explained to me that a a user from a tenant which has never provisioned Azure Analysis Services cannot be added to another tenant's provisioned server. Put another way, our Corporate tenant had never provisioned AAS so the Development tenant could not do so via cross-tenant guest security.

One resolution for this is to provision an AAS server in a subscription associated with the Corporate tenant, and then immediately delete the service from the Corporate tenant. Doing that initial provisioning will do the magic behind the scenes and allow the tenant to be known to Azure Analysis Services. Then we can proceed to provision it for real in the Development tenant.

Another resolution is to utilize an AAD account which is directly from the Development tenant as the AAS administrator.

You Might Also Like...

Why a Semantic Layer Like Azure Analysis Services is Relevant

Use Cases for Azure Analysis Services (Part 2)

November 29, 2016

This is part 2 of a discussion about a new service in Microsoft Azure: Azure Analysis Services. Azure AS is a Platform-as-a-Service (PaaS) offering which is in public preview mode (as of November 2016).

Part 1: Why a Semantic Layer Like Azure Analysis Services is Relevant

Part 2: Use Cases for Azure Analysis Services {you are here}

Part 3: Where Azure Analysis Services Fits Into BI & Analytics Architecture

Image Credit: https://azure.microsoft.com/en-us/blog/introducing-azure-analysis-services-preview/ — Image Credit: https://azure.microsoft.com/en-us/blog/introducing-azure-analysis-services-preview/

Because Azure AS is a cloud service, it has several meaningful benefits we've come to associated with this type of PaaS service:

Ability to scale up or down to increase performance when the demand is there (yet not pay for that level of performance when it's not required)
Ability to pause to save money if no queries are issued during a particular period of time (such as overnight on Dev/Test databases)
No need to provision hardware, handle upgrades nor patching
Inherent redundancy of the data

Note: at this early time of the public preview, not every feature mentioned in this post is available just yet, but they are coming.

Azure SSAS May Be Beneficial For...

Upsizing Data Size from Power BI

If you import the data into a Power BI data model (vs. issuing queries via DirectQuery mode), the data imported is currently limited to 1GB of data if you want to upload the file to the Power BI Service. This 1GB is after it's been compressed into its in-memory columnar format. The columnar compression is very efficient, so it can contain quite a bit of data, but 1GB certainly doesn't represent enterprise level data models (again, I'm referring to imported data not DirectQuery scenarios). Upsizing to Analysis Services changes all that since it can handle larger data volumes imported to the in-memory data model.

Faster Data Refresh Schedules than Power BI

Currently you can set a Power BI dataset (which has been imported) to refresh up to 4 times per day. If your data latency requirements dictate fresher data than that, then Analysis Services can be scheduled more frequently. (Just like the previous item, this refers to imported data and not DirectQuery models.)

Varying Levels of Peak Workloads

Let's say during month-end close the reporting activity spikes much higher than the rest of a typical month. In this situation, it's a shame to provision hardware that is underutilized a large percentage of the rest of the month. This type of scenario makes a scalable PaaS service more attractive than dedicated hardware. Do note that currently Azure SSAS scales compute, known as the QPU or Query Processing Unit level, along with max data size (which is different than some other Azure services which decouple those two).

User Activity Occurs During Fixed Hours Only

We will be able to pause the Azure AS service in order to save charges. If you're a regional company with users who don't need to query the system from, say 10pm to 7am, you'll be able to pause the service programmatically if you choose.

Cloud or Hybrid Fits Your Strategic IT Direction

If you're purposely attempting to reduce the overhead of running a data center, then more and more service offerings like this one may be a fit. To the extent you're using multiple services such as Azure SQL Data Warehouse and/or Azure SQL Database, I'm hoping we're going to see some performance benefits (assuming you've selected Azure locations in close proximity to each other for the related services that pass data around).

You Are Short on Performance Tuning Expertise

I tend to get a little preachy that running workloads in the cloud does *not* mean your DBA can retire. However, it is certainly the case that there's less administrative oversight with cloud services. With this Azure AS PaaS service, rather than tuning your server specs, instead you would change the scale level for your instance in Azure - certainly a lot easier because there's a way fewer "knobs" to adjust. Having said that, there's still *lots* of things to pay attention to: performance of queries from the original source (if your data refresh window is small, or if you are using DirectQuery mode), and also good design patterns are always *incredibly* important in order to achieve optimal performance from an Analysis Services model.

Getting New Features Fastest

We're going to be seeing new features hit Azure Analysis Services faster than SQL Server Analysis Services. Using the AAS cloud service, versus the SSAS service which is part of the SQL Server box product, offers that if getting the latest and greatest quickly appeals to you.

Next up is Part 3: Where Azure Analysis Services Fits Into BI & Analytics Architecture

Finding More Information

Kasper de Jonge's blog: Analysis Services in Azure, When and Why (I had 95% of this post written before Kasper published his, so I decided to publish mine anyway even though they're fairly similar.)

Azure Documentation - What is Azure Analysis Services?

Feedback to the Product Team - Feedback on Azure AS

Channel 9 Videos: Azure AS Videos

You Might Also Like...

Building Blocks of Cortana Intelligence Suite in Azure

Power BI Features End-to-End

Why a Semantic Layer Like Azure Analysis Services is Relevant (Part 1)

October 30, 2016

This is part 1 of 3 about Azure Analysis Services (Azure AS) which was announced a few days ago. Azure AS is a Platform-as-a-Service (PaaS) offering which is currently in public preview mode at the time of this writing (late October 2016).

Part 1: Why a Semantic Layer Like Azure Analysis Services is Relevant {you are here}

Part 2: Use Cases for Azure Analysis Services

Part 3: Where Azure Analysis Services Fits Into BI & Analytics Architecture

Fundamentally, Analysis Services serves as a semantic layer (see below for further discussion of a semantic layer). Because the business intelligence industry now embraces an array of technology choices, sometimes it seems like a semantic layer is no longer valued like it once was. Well, my opinion is that for many businesses, a semantic layer is tremendously important to support the majority of business users who do *not* want to do their own data wrangling, data prep, and data modeling activities.

Let's first do a super quick recap what Analysis Services is.

Analysis Services Primer

Analysis Services can be categorized with competitor tools such as Business Objects Universe, Cognos Framework Manager, or AtScale.

Prior to the introduction of Azure AS, Analysis Services was available as part of the SQL Server stack. Therefore, we now have SSAS which is a component of SQL Server, and Azure AS which is now a separate cloud-based offering. The main difference is in where the model is deployed; the development experience is exactly the same. Prior to the release of Azure AS (PaaS), the only way to run Analysis Services in Azure was within a virtual machine (IaaS). If you're searching for information online, be a little careful & watch the dates-you will easily run into older articles which are talking about running SSAS in a VM rather than this PaaS service. The good news is that nearly all of the information will still be relevant though since the development experience remains in SQL Server Data Tools; the primary difference is in deployment.

There are two modes to an Analysis Services instance: Tabular and Multidimensional.

Tabular is the newer in-memory data structure, based on DAX (Data Analysis eXpressions). Generally speaking, Tabular has an easier entry point though there is *definitely* a learning curve to DAX if you go beyond the simple stuff.
Multidimensional is traditional OLAP so it's been around a lot longer, has a few more advanced features, and is based on MDX (MultiDimensional eXpressions). Multidimensional models are less 'forgiving' than Tabular models with respect to certain design characteristics (such as classifying facts and dimensions).

The rule of thumb most BI folks follow these days is to look at using a Tabular model unless you find you need one of the features that Multidimensional provides than isn't in Tabular yet. For this public preview of Azure AS, Tabular is the only mode supported initially. The product team has stated that Multidimensional will be considered for a future release, based on customer demand.

Regardless if you use Tabular or Multidimensional, your data is handled one of two ways:

It can be imported into the Analysis Services in-memory model, which is also sometimes called cached mode. In this mode, Analysis Services holds a copy of the data and serves up the data from user queries. An in-memory model provides an excellent user experience assuming that enough memory is provided, and the design is sound. The data contained in an in-memory model needs to be refreshed on a schedule, so there's always some data latency.
The other option is to utilize DirectQuery mode. In this mode, the queries are passed onto the underlying data source so Analysis Services really is metadata only. To get optimal performance with DirectQuery mode, the data source needs to be tuned for performance of user queries which is sometimes a challenge. The appeal of DirectQuery mode is access to real-time data, and the omission of needing to populate another dataset in Analysis Services.

Analysis Services development is usually handled by BI/Analytics/IT professionals within SQL Server Data Tools in Visual Studio, and it is typically under source control such as TFS or GitHub. No changes to the development experience occur as part of the introduction of Azure AS. This is a good thing-but it's important to be aware that development of a semantic model does take some time. It's fast and easy to create an Azure AS instance in Azure, and it's fast and easy to deploy the model, but development of the model itself does take time and know-how. The good news is there's tons of resources available to learn development techniques, such as dimensional modeling which is optimal for reporting.

Analysis Services is considered an enterprise-level data modeling tool. Ideally it exposes a large amount of your corporate data to business users. The pricing of Azure AS reflects that it is an enterprise-level tool (you can see pricing in the Azure portal when you provision the service).

Benefits of a Semantic Layer

According to b-eye-network: "A semantic layer is a business representation of corporate data that helps end users access data using common business terms."

A semantic layer sits between the original database and a reporting tool in order to assist business users with ease of reporting. It becomes the main entry point for data access for most business users when they are creating reports, dashboards, or running ad hoc queries.

Traditionally, we've built a semantic layer on top of a traditional data warehouse. That's certainly not a requirement, though certainly still a very valid and common design pattern. In Part 2 of this post we'll talk about some variations on that theme to accommodate a modern data warehouse platform.

Though exposing reporting to users via a semantic layer is not an absolute "must" in today's world of options, it facilitates reporting for things such as:

Data pre-integrated for users (ex: Salesforce data, invoice data, A/R data, and inventory data are all integrated for the end user to consume)
No joins or relationships for users to worry (because they've all been handled in the data model)
Columns have all been renamed into business user-friendly names (ex: Invoice Amount instead of INVAMT)
Business business logic and calculations have been centralized in the data model (which reduces risk of recalculations being done incorrectly)
Time-oriented calculations are included which are really powerful (ex: Sales Increase Year Over Year; Sales Year-To-Date; % of Change Since Prior Quarter)
Aggregation behavior has been set so reporting tools respond correctly (ex: sales $ sum up, but something like a customer number or invoice does not try to aggregate)
Formatting has been specified so reporting tools handle it by default (ex: sales displays with the $ sign whereas units display with commas and no pennies)
Data security is incorporated (ex: standard row-level security, and/or exposing certain sensitive measurements to only authorized users)

In summary, a semantic layer is all about the convenience of business users so they can focus on getting the answer they need. It facilitates self-service reporting by providing a cleansed, understandable, trustworthy environment. A semantic layer can often support self-service analysis for 80% of a user base. After all, most functional users want to get the information quickly and then get on with their day. A semantic layer can also help the other 20% of users that have additional needs as well--though in all fairness, those hard core data analysts and data scientist types often prefer to go after the original source data, though there's certainly still value in them acquiring some of their data from the semantic layer as well. After all, why reinvent the wheel if the investment has been made?

So, even though we have tons of options these days for information delivery, and a single semantic layer for an entire organization isn't a silver bullet, I firmly believe that a semantic layer is certainly still a great tool in the toolbox.

Next up is Part 2: Use Cases for Azure Analysis Services