Tips for Using Azure Data Catalog

It seems there's a lot of interest in Azure Data Catalog from the customers that I work with. Since I've been discussing it a lot recently during projects, I thought I'd share a few thoughts and suggestions.

Register only Production data sources. Typically you won't want to register Development or UAT data sources. That could lead to users seeing what appears to be duplicates, and it also could lead to inconsistency in metadata between sources if someone adds an annotation to, say, a table in UAT but not in Production. 

 
 

Register only data sources that users interact with. Usually the first priority is to register data sources that the users see-for instance, the reporting database or DW that you want users to go to rather than the original source data. Depending on how you want to use the data catalog, you might also want to register the original source. In that case you probably want to hide it from business users so it's not confusing. Which leads me to the next tip...

Use security capabilities to hide unnecessary sources. The Standard (paid) version will allow you to have some sources registered but only discoverable by certain users & hidden from other users (i.e., asset level authorization). This is great for sensitive data like HR. It's also useful for situations when, say, IT wants to document certain data sources that business users don't access directly.

Use the business glossary to ensure consistency of tags. The business glossary is a capability of the Standard (paid) version and is absolutely worth looking into. By creating standardized tags in the business glossary, you'll minimize issues with tag inconsistency that would be annoying. For example, the business glossary would contain just one of these variations: "Sales & Marketing", "Sales and Marketing", or "Sales + Marketing".

Check the sources in the "Create Manual Entry" area of Publish if you're not seeing what you're looking for. There's a few more options available in Manual Entry than the click-once application.

Use the pinning & save search functionality to save time. For data sources you refer to often, you can pin the asset or save search criteria. This will display them on the home page at AzureDataCatalog.com so they're quicker to access the next time.

Use the Preview & Profile when registering data when possible. The preview of data (i.e., first X rows) and profile of data (ex: a particular column has 720 unique values that range from A110 to M270) are both extremely useful when evaluating if a data source contains what the user really wants. So, unless the data is highly sensitive, go ahead and use this functionality whenever you can.

 
 

Be a little careful with friendly names for tables. When you set a friendly name, that becomes the primary thing a user sees. If it's very different from the original name, it could be more confusing than helpful because users will be working with the primary name over in tools such as Power BI.

 
 

Define use of "expert" so expectations are clear. A subject matter expert can be assigned to data sources and/or individual objects. In some organizations it might imply owner of the data; however, in the Standard (paid) version there is a separate option to take over ownership. Usually the expert(s) assigned indicates who knows the most about the data & who should be contacted with questions. 

Be prepared for this to potentially be a culture change. First, it's a culture change for DBAs who are responsible for securing data. The data catalog may absolutely expose the existence of a data source that a user didn't know about--however, remember that it only exposes metadata and the original data security is still in force. The other culture change affects the subject matter experts who know the data inside and out. These folks may not be used to documenting and sharing what they know about the data. 

You Might Also Like...

Overview of Azure Data Catalog in the Cortana Intelligence Suite <--Check the "Things to Know about Azure Data Catalog" towards the bottom this post for more tips

How to Create a Demo/Test Environment for Azure Data Catalog