***Note this post applies to Power BI for Office 365 (deprecated as of 12/31/2015). Though some concepts remain the same or similar, all details are not necessarily applicable to the new Power BI V2 service.***
Recently I've been chatting with some fellow BI professionals about the Data Management Gateway (DMG) in Power BI and when exactly it's required to be used.
Before we answer this question, let's go through a quick bit of background...
What is the Data Management Gateway?
There are 3 components involved, all of which are set up by the Power BI administrator:
- Data Management Gateway (DMG). This is a client agent installed on the on-premises server. The DMG handles encryption, compression, and transmission of data using a service bus. Communicates with O365 using a key provided by the Power BI Gateway.
- Power BI Gateway. This is set up in the Power BI Admin Center (in O365). Handles communication with an on-premises server.
- Power BI Data Source. This is set up in the Power BI Admin Center (in O365). Connects to a database via the Gateway.
When the administrator creates a Data Source in the Power BI Admin Center, you have 2 choices: To allow cloud access and/or to create an OData feed. If cloud access is allowed, that will permit users of the Power BI app to schedule data refreshes that point to that database. If you create an OData feed, that will expose tables and views to organizational search using Power Query.
When is the Data Management Gateway Required?
SQL Server Database On-Premises: Yes
Oracle Database On-Premises: Yes
Windows Azure SQL Database (WASD): No*
SQL Server Database in an Azure VM: No*
Public OData Sources: No
Each of the * items are cloud services associated with Azure. For those types of data sources, Power BI permits the scheduled data refresh to occur using stored credentials in the workbook, or using the secure store service, without requiring the 3 components listed above to be set up by the administrator. As of now (March 2014), it's currently just the on-premises data sources which require the 3 components listed above.
The advantage of not having to set up the DMG is that it's less steps to set up. If you're using a cloud service as a way to quickly and cheaply set up a data source for temporary use or a one-time analysis, the simplicity is nice.
One disadvantage of not having to set up the DMG is that the Power BI system administrator isn't seeing the data refresh activity occur in the Power BI Admin Center. As of this release of Power BI, I'm not seeing where data refreshes from cloud sources are being logged. In a managed self-service BI environment, this visibility to what's really happening in the system is very important.
Another disadvantage is that it's a different user experience. Let's say User Emily upload a new workbook which has a data source from SQL Server on-premises. Her scheduled data refresh will fail unless the Power BI administrator has set up the 3 components listed above. However, if User Emily uploads a new workbook which has a data source from an Azure SQL Database, her scheduled refresh will be successful without these extra steps. Since users aren't always aware of where data sources are, this can lead to an inconsistent experience where they're not exactly sure when it "just works" versus when to contact the Power BI administrator for additional help.
Having said all that, I'm sure we all expect that much of how things work will evolve as Power BI matures and grows. It is, after all, in its first release. Am just passing along how I've observed that it's working currently so administrators and users are aware of the behavior.