What is the Cortana Analytics Suite?

Since I’m a data nut, I’m intrigued with Microsoft’s new offering referred to as the Cortana Analytics Suite (which I’ll call CAS for short).

First things first, CAS is not a product in and of itself, though it will have its own pricing. CAS can be thought of as a bundle of integrated products and services. It’s somewhat similar to the idea of the Office suite or the SQL Server suite, both of which contain various components that are interoperable (at least to a certain extent). I get the feeling with CAS that interoperability/integration will be a huge emphasis. Another big emphasis will be on the availability of templates and preconfigured solutions in CAS which should accelerate and simplify development for particular scenarios.

Since CAS isn’t officially available yet, most of what can be found right now are marketing materials – though most of the components are available individually now and have varying levels of technical documentation available. I’m excited to be attending the CAS Workshop in September in Seattle, where I’m hoping to learn a lot more about the integration points, interoperability, accelerators, and overall capabilities.

What are the Components of Cortana Analytics Suite?

Knowing this is a bundle of tools with an emphasis on integration and automation, for the purpose of advanced analytics, what are the components of the suite? 

The documentation lists the following as elements of Cortana Analytics Suite:

  • Azure Machine Learning
  • Azure HDInsight
  • Azure Stream Analytics
  • Azure Data Lake
  • Azure SQL Data Warehouse
  • Azure Data Catalog
  • Azure Data Factory
  • Azure Event Hub
  • Power BI
  • Cortana
  • Face, vision, speech and text analytics
  • Preconfigured solutions for recommendations, forecasting, churn, etc.

There are other Azure components that will play a part in data-oriented solutions as well; I’m showing some of these key components in the image above (in orange towards the bottom) even though they aren’t “officially” part of Cortana Analytics Suite.

Why is Cortana in the Name?

One of my first questions when this was announced:  Why is Cortana in the name? The idea here is that the personal assistant, Cortana, will be able to provide information upon request or proactively. Something such as:  “Hey Cortana, what is the total of yesterday’s sales?” appears to be the next evolutionary step of the Q&A natural language capabilities first seen in Power BI. A public demo indicated that Power BI will be just one way to expose data to Cortana.

Source for image: July 2015 Webinar by Joseph Sirosh

Here’s a very interesting quote from a TechCrunch article:

“As for Cortana, which is the Microsoft voice-driven personal assistant tool in Windows 10, it’s a small part of the solution, but Sirosh says Microsoft named the suite after it because it symbolizes the contextualized intelligence that the company hopes to deliver across the entire suite.”

So, we have an extremely broad platform with Cortana Analytics Suite. Stay tuned for my follow-up posts where we start looking at the individual components.

Ways to Utilize Power BI in a Bimodal BI Environment

If I’d have been writing this 6 months ago, I would have said Power BI is for self-service BI. End of story. Well, things aren’t that simple anymore with Power BI V2. The way I see it, there are two main ways to utilize Power BI:

Report Only

This method utilizes only the reporting & dashboarding capabilities of Power BI. The source data stays where it’s located and is *not* replicated within the Power BI data model (aka Power Pivot). This approach requires a “direct connection” to the underlying data source and data refresh is handled at the source rather than in Power BI.

As of now (early August 2015), there are four data sources which support direct connection to Power BI (thus allowing the ‘Reporting Only’ option without requiring an intermediary data model):

  • Analysis Services Tabular Model
  • Azure SQL Database
  • Azure SQL Data Warehouse
  • Spark on HDInsight

The ‘Reporting Only’ method is appropriate for:

  • Reporting on higher data volumes (because the size limit for a Power BI model is currently 250MB compressed)
  • Reporting when row-level security is a priority (for example: Power BI can utilize the security roles defined in an SSAS Tabular Model to implement row-level security by using Effective User Name)
  • Reporting on data sources which already exist and there’s no requirement to join to other independent data sources

Query, Model, and Report

With this approach, the data is replicated and stored in a data model. This data model can be created in either Excel (using Power Pivot) or via the Power BI Designer. The replicated data is typically refreshed on a regular schedule. This is the way we’ve thought of the Excel add-ins of Power Query, Power Pivot, and Power View for some time now.

The Query>Model>Report approach is most useful for:

  • One-time or infrequent analyses (ex: a one-time analysis of whether to open a new store)
  • Mashing up data from multiple sources (ex: financial data from an internal data warehouse plus external industry metrics)
  • Small, self-service projects (i.e., those which cannot be justified to add to a larger, centralized solution)
  • Use of the Power BI SaaS connectors (because it stores and refreshes the data in a model)
  • Use of the Power BI APIs (because it stores and refreshes the data in a model)

Bimodal BI: Self-Service and Corporate BI

I’m really happy to see V2 maturing and evolving. We already know that it’s a powerful self-service BI tool. At this point it can be used for *certain* projects that might be considered corporate BI:

  • The ‘Reporting Only’ approach with direct connectivity to a centralized data source
  • The Power BI SaaS connectors, particularly if used in conjunction with groups and organizational content packs
  • The Power BI APIs, particularly if used in conjunction with groups and organizational content packs

Just how am I differentiating self-service BI from corporate BI you ask? Great question. I see self-service BI as being driven by business users, with less governance and a focus on freedom of exploration. Self-service can mean a lot of things (like parameterized reports delivered to users from IT), but in the context of Power BI, we tend to think of the functional user handling the cleansing, shaping, modeling, and reporting -- though in practice that’s not always true. The definition of self-service isn’t the same in every company, or even across departments or business units.

Conversely, traditional corporate BI places a lot of emphasis on things like standardization, scalability, efficiency, reusability, and accuracy of data. Both of these worlds usually do exist, whether it’s formally sanctioned and planned or not.

Over time we’re going to continue seeing more integration points (particularly with Cortana Analytics Suite), more APIs to bring data in and out of Power BI, more SaaS connectors, and more direct connection abilities. I'm really interested in finding the best balance between corporate and self-service BI, so it’ll be interesting to watch as Power BI evolves with its capabilities to support both.

What is the New D3.js Visualization Engine for Power BI?

The other day someone asked me about the Silverlight dependency as it relates to Power BI. I got to talking about that and the new visualization engine underlying Power BI Desktop, and here we are with a blog post. This discussion is as of early August 2015.

New Capabilities for Power BI Desktop

The latest release of Power BI Desktop includes a new visualization engine based on D3, which stands for Data-Driven Documents

D3.js is a JavaScript library of objects to produce sophisticated, interactive, dynamic data visualizations using modern web-based technologies. Translated...this means that D3 is the connection point between a user interaction and the data underneath, allowing a web page to dynamically change rather than remain static. As the D3.org website states:

"D3 helps you bring data to life using HTML, SVG, and CSS."

It's based on open standards which will be meaningful for customized Power BI visualizations (discussed in the next section). D3 has a ton of community support, documentation, and examples to help getting started with the framework. D3 is very flexible and customizable, though the tradeoff is a learning curve.

Ability to Create Custom Power BI Visuals

Microsoft has released an open source project on GitHub. This is the first step towards developers being able to create customized visuals for Power BI dashboards and reports. A lot of customers, partners, and ISVs are looking forward to this extensibility. This means that we'll no longer be limited to whatever visualizations are baked into Power BI Desktop. I'm sure we'll see some really innovative and interesting visuals. Oh, I'm certain we'll see some ugly ones too.

At this point (Aug 2015), Microsoft has released the source code and announced custom visuals ability is coming. There's no indication yet as to a date when Power BI custom visuals will be fully functional, but it probably won't be long since the Power BI team is moving at an amazing pace.

Getting Started With Custom Visuals for Power BI

There's a *lot* of tutorials available online for learning D3. Most of the educational materials say that it's helpful to also be familiar with the following:

  • HTML. Structural elements for web pages. Power BI is using HTML5.
  • CSS. Styling of web pages. Power BI is using CSS3.
  • Javascript. An object-oriented programming language that allows display to change based on user interactions. This is code inside the HTML that will work with D3; everything ends up getting compiled down to Javascript.
  • SVG, Web GL, Canvas, etcOptional - graphics formats which support functionality such as interactivity and animation. Power BI is flexible with which graphical API you prefer to use, but their open source project is all D3.

Regarding how these components fit together, here's a helpful quote from dashingd3js.com:

"D3.js helps you attach your data to DOM (Document Object Model) elements. Then you can use CSS3, HTML, and/or SVG showcase this data. Finally, you can make the data interactive through the use of D3.js data-driven transformations and transitions."

I would also add that at least some basic knowledge of good data visualization practices is also necessary. Effective visuals are extremely important for communicating information clearly and concisely. Just because it's pretty doesn't mean it's effective, right?

Power BI Components Using Silverlight 

So, given the new D3.js news, what does that mean for the elements that use Silverlight? Well, that's a good question. I don't believe Microsoft has announced timing for when Silverlight will be phased out. At this point, the following Power BI components still rely upon Silverlight:

  • Power View for Excel
  • Power View for SharePoint
  • Power Pivot Gallery in SharePoint

Finding More Information

D3.js.org site

D3.js repository on GitHub

Power BI Visuals on GitHub  <--includes documentation and prerequisites

List of interesting libraries, plugins and utilities

List of D3 tutorials, techniques, blogs, books, courses, and videos

Will the Real Power BI Please Stand Up?

Power BI V2 went GA last week. This means it's no longer in "Preview" status; it's "Generally Available." Preview offerings are not intended to be used in Production environments, so now companies can begin formally converting over from Power BI for Office 365 (V1) to the new Power BI service (V2) when they're ready.

Different Ways Power BI Can be Utilized

If you are just beginning to research Power BI or haven't needed to pay extremely close attention to its evolution these past couple of years, you might find information online to be confusing or contradictory. The way I usually explain it is that there's four ways Power BI could be deployed depending on which components you choose to use.

With option 1, you're using just the Excel add-ins (Power Query, Power Pivot, Power View) and/or just the Power BI Desktop application. Those files are saved and shared on the file system. Although it's not an end-to-end solution, this mode certainly does offer a lot of functionality.

Option 2 is the on-premises solution utilizing SharePoint. It's been around quite a while now. There are some meaningful differences between using full-fledged SharePoint vs. the online services. 

Option 3, Power BI for Office 365 (aka V1), is being retired in favor of Option 4, Power BI Dashboards (aka V2).

When discussing Power BI these days (mid-2015), people are usually talking about #4 which is the end-to-end solution. Although sometimes they could be talking about just option #1 without the web-based component. #2 is still a viable solution; it's just not the "hot" thing right now & doesn't get talked about much.

 

My purpose for writing this post is this... Please, please pay attention to the age of the article or blog you're reading and which product it relates to.  If you're reading an older page about V1, chances are at least some of the information isn't accurate regarding V2. If you're reading about Power Pivot for SharePoint, very little will line up with the Power BI V2 Software-as-a-Service model. Even reading about Power View functionality within Excel versus within the Power BI Desktop differs significantly.