Why You Want to Use the Latest SQL Server Management Studio Release

We have two main built-in tools for interacting with a SQL Server database platorm: SQL Server Management Studio (SSMS) and SQL Server Data Tools (SSDT). This post is about SSMS and why you may want to install the latest release...even if you're not ready to upgrade to SQL Server 2016.

SSMS is now a standalone tool which is being updated on a monthly basis for bug fixes and new features. This is big because it allows the tooling team to be speedier with releases. The latest download file is easily located on the web.

Frequently you'll see it referred to it as SSMS 2016, though technically speaking it is "version-less" now. It's sometimes referred to now as SSMS 16.x which correlates with the release numbering at this time. If you are checking your Help>About menu in SSMS to check what's installed, you'll need to translate the build number shown in Help>About to the release number which can be found in the changelog:

SSMS is now based on the Visual Studio 2015 shell. This gives it some performance improvements and removes the dependency on installing .NET Framework 3.5 as a prerequisite. **For this reason, you will want to make sure Visual Studio 2015 is installed first followed by SSMS and SSDT.**

SSMS is supported for managing SQL Server 2008 through 2016 (except for SSIS instances which sadly still require a version-specific SSMS at the time of this writing). If you manage numerous servers on different versions, this unification is fantastic. There is partial support for managing pre-2008 instances. And, of course as you'd expect, the newest SSMS release supports various new features in SQL Server 2016 such as Query Statistics, Live Query Plans, Compare Showplan, Security Policies for row-level security, and so on with all the new 2016 goodies we have. 

SSMS also supports managing Azure components such as Azure SQL Database and Azure SQL Data Warehouse, as well as hybrid cloud features such as StretchDB and Backup to URL. This additional unification is really, really handy.

SQL PowerShell updates are also shipped with SSMS. In order to take advantage of this monthly release cadence, the latest and greatest SQL PowerShell module is now called "SqlServer" whereas the the legacy module, which does *not* have the newest cmdlets released since mid-2016, is "SQLPS". More info here: https://blogs.technet.microsoft.com/dataplatforminsider/2016/06/30/sql-powershell-july-2016-update/.

If you'd like to follow and vote for new feature requests, the is a SQL Server Community Trello board here: https://trello.com/b/M9NmFPfv/sql-server-management-studio-2016-enhancements.

Finding More Information

You Might Also Like...

Why You Should Use a SSDT Database Project For Your Data Warehouse

Overview of Azure Automation

Data Lake Use Cases and Planning Considerations

I've spent a lot of time this past year learning about and thinking about data lakes. Mostly I've been interested in how to integrate a data lake alongside existing investments without making a mess of things.

A data lake is one piece of an overall data management strategy. Conceptually, a data lake is nothing more than a data repository. The data lake can store any type of data. Cost and effort are reduced because the data is stored in its original native format with no structure (schema) required of it initially. 

Data lakes usually align with an "ELT" strategy which means we can Extract and Load into the data lake in its original format, then Transform later *if* a need presents itself. An "ELT" strategy goes along with the "Schema on Read" technique that is prevalent in the big data world. It's become prevalent because you're able to store the data relatively easily with less up-front time investment, then query the data "where it lives" without being required to relocate the data first (though you may want to later after a solution is productionized).

Conversely, traditional data warehousing typically follows at "ETL" strategy in which the Transformation occurs before the Load to the data warehouse - this is referred to a "Schema on Write" because the schema, i.e., structure of the data, must be defined before the data is loaded to the data warehouse. This takes time and effort to do correctly. It also means we typically need to define use cases ahead of time in order to validate the schema is correct. We can also expect the use cases and needs to evolve over time - this is a good thing, but can be challenging to implement sometimes.

Use Cases For a Data Lake

There are a variety of ways you can use a data lake:

  • Storage of all organizational data to support downstream reporting & analysis activities. Some organizations wish to achieve a single storage repository for all types of data. Frequently, the goal is to store as much data as possible to support any type of analysis that might yield valuable findings.
  • Ingestion of semi-structured and unstructured data sources such as equipment readings, telemetry data, logs, streaming data, and so forth. A data lake is a great solution for storing IoT (Internet of Things) type of data which has traditionally been more difficult to store. Optionally, you can also add structured data (i.e., extracted from a relational data source) to a data lake if your objective is a single repository of all data to be available via the lake.
  • Experimental analysis of data before its value or purpose has been fully defined. Agility is important for every business these days, so a data lake can play an important role in "proof of value" type of situations because of the "ELT" approach discussed above.
  • Archival and historical data storage. Sometimes data is used infrequently, but does need to be available for analysis. A data lake strategy can be very valuable to support an active archive strategy.
  • Preparation for data warehousing. Using a data lake as a staging area of a data warehouse is one way to utilize the lake, particularly if you are getting started.
  • Application support. In addition to analysis by people, a data lake can be a data source for a front-end application. The data lake might also act as a publisher for a downstream application (though ingestion of data into the data lake for purposes of analytics remains the most frequently cited use).

Planning a Data Lake

As we discussed above, a data lake reduces the up-front effort of getting data stored because we aren't absolutely required to structure it first. However, that doesn't mean there's no planning at all. There's various things you want to consider as you're planning a data lake -- to prevent it from becoming the dreaded data swamp -- including things such as:

  • Security around data access
  • Encryption requirements
  • Governance
  • Data quality
  • Master data management
  • Metadata management
  • Organization of data for optimal data retrieval
  • Scheduling and job management
  • Logging and auditing
  • To what extent data federation will be utilized

You definitely want to spend some time planning out how the data will be organized so that finding the data is as straightforward as possible. Just like with planning anything where data is stored (for example, a regular file share, or a SharePoint document library, etc.), you usually want to consider subject areas along with user groups and security boundaries. 


There's many different ways to organize a data lake. For batch data loads, here's one example:

Raw Data

    Organizational Unit (ex: East Division)

          Subject Area (ex: Sales)

               Original Data Source (ex: SalesForce)

                    Object (ex: Customer Contacts table)

                         Date Loaded


Key points about the "Raw Data" layer:

  • The raw data is usually considered immutable data (i.e., unchangeable). This allows you to go back to a point in time if necessary.
  • The speed at which data arrives should be a non-issue for the data lake. Therefore, you want your data lake to be able to accommodate a batch layer (data loaded from batch processes) and a speed layer (data loaded from streaming data or near real-time applications).
  • The raw data layer is often partitioned by date. If there's an insert/update date in the source that can be relied upon, you can organize incremental data loads by that date. If there's not a reliable date to detect changes, then you need to decide if the source is copied over in its entirety every time the data load executes. This correlates to what extent you need to detect history of changes over time--you probably want to be liberal here since all requirements aren't identified upfront.
  • The raw data should be stored the same as what's contained in the source. This is much like how we would stage data for a data warehouse, without any transformations, data cleansing, or standardization as of yet. 
  • Very, very few people have access to the raw data layer. Just like staging for a data warehouse is considered the back-end "kitchen" area, the raw data layer is usually hands-off for most users.
  • The above sample raw data structure emphasizes security by organizational unit. Tradeoff here is a data source table may be duplicated if used by different organizational units. Therefore, a different type of structure might make more sense depending on your needs. 

For purposes of permitting data retrieval directly from the data lake, usually a different layer is recommended for the curated data, or specialized data. It could be organized more simply such as:

Curated Data

     Purpose (ex: Sales Trending Analysis)

          Type (ex: Transactional or Summarized)


Key points about the "Curated Data" layer:

  • User queries are from the curated data layer (not usually the raw data layer).
  • The curated data layer contains data for specific, known, purposes. This means that the curated data layer is considered "Schema on Write" because its structure is predefined.
  • Some data integration and cleansing can, and should occur, in the curated data layer.
  • It's not uncommon to restructure data to simplify reporting (ex: consolidate several related tables into one file).
  • To the extent that friendly names can be assigned, rather than cryptic names from a source system, that should be done to improve usability. Anything you can do to make it simpler to use is usually a good time investment in my mind.
  • Depending on how far you take it, the curated data layer could be set up similarly to a data warehouse, particularly if you take advantage of some of the Hadoop technologies available to augment the data lake infrastructure.
  • You might find that multiple logical data layers to deliver curated data makes sense for your purposes (similar to the concept of different data marts).

It's also very common to include a "sandbox" area in your data lake. This provides your highly trained analysts and data scientists an area where they can run processes and write data output.

Does a Data Lake Replace a Data Warehouse?

I'm biased here, and a firm believer that modern data warehousing is still very important. Therefore, I believe that a data lake, in an of itself, doesn't entirely replace the need for a data warehouse (or data marts) which contain cleansed data in a user-friendly format. The data warehouse doesn't absolutely have to be in a relational database anymore, but it does need a semantic layer which is easy to work with that most business users can access for the most common reporting needs.

Lessons learned via analysis done from the data lake can often be taken advantage of for the data warehouse (the exact same thing we often say with regard to self-service BI efforts influencing and improving corporate BI initiatives). It is true that agility can be faster when conducting analysis directly from the data lake because you're not constrained to a predefined schema -- anyone who has experienced shortcomings in the data warehouse will appreciate this. For instance, let's say the data warehouse relates the customer and region dimensions to a fact table. Then someone wants to count customers by region, even if the customer doesn't have any sales. In the DW there's at least a couple of different ways to handle it, but we need to build on the initial schema to handle it (because dimensions usually only relate through facts, not directly to each other). Whereas when you start in the data lake with the question, such as customers by region, you have more freedom and flexibility. However, that freedom and flexibility come at a cost -- data quality, standardization, and so forth are usually not at the same maturity level as the data warehouse.

There's always tradeoffs between performing analytics on a data lake versus from a cleansed data warehouse: Query performance, data load performance, scalability, reusability, data quality, and so forth. Therefore, I believe that a data lake and a data warehouse are complementary technologies that can offer balance. For a fast analysis by a highly qualified analyst or data scientist, especially exploratory analysis, the data lake is appealing. For delivering cleansed, user-friendly data to a large number of users across the organization, the data warehouse still rules.

You Might Also Like...

Why You Should Use an SSDT Database Project For Your Data Warehouse

A New Adventure at SQL Sentry

For the last 8 years, consulting roles have offered me a chance to learn and grow and expand my skills. Learning all of the time is really pivotal to my sense of happiness and satisfaction. In particular, BlueGranite has been an absolutely fantastic employer. They have a ton of really sharp people who are smart, motivated, and fun to be around. The leadership team truly values their technical staff, and they are trustworthy. I will miss a ton of people at BlueGranite (especially a few...you know who you are), but it's time for a new adventure.

As of October 3rd I'll be joining the SQL Sentry team as a BI Architect. I'll be the second BI-type of person to join their staff. The first is Jim Benton, who I worked with at a previous employer. In fact, this job change is all Jim's fault - he spoke so incredibly highly of SQL Sentry that I just had to look into this new role they created.

So what will I be doing? Since I don't yet have clarity on what's proprietary information and what's not, I'll just say that I'll be helping with BI and analytics for their product line, as well as internal analytics projects. Some of the projects in the pipeline are going to be seriously awesome.

If you're not familiar with SQL Sentry, they are a software development company. SQL Sentry creates monitoring and performance optimization software for SQL Server, Windows, and some Azure components which are related to SQL Server. They have a variety of paid offerings in their suite of tools, but you can get started for free with a fantastic tool called Plan Explorer

As I write this I'm enjoying a week of leisurely life in between jobs. Well, leisure mixed in with a little closet cleaning and weed pulling that is. I look forward to posting all sorts of new info soon, and it'll be really interesting to see how my technical focus shifts over time.

Overview of Azure Automation

We're currently setting up Azure Automation as one component of an overall solution which uses PowerShell for managing members in an Office 365 unified group (per my recent post on the issue). I'll post those details as soon as it's completed and tested. In the meantime, this post is an introduction to what Azure Automation (AA) is all about.

Azure Automation is a cloud service in Microsoft Azure which let you schedule execution of PowerShell cmdlets and PowerShell workflows. Azure Automation uses the concept of runbooks to execute a set of repeatable, repetitive tasks via PowerShell. Consistency in execution, reduction of errors, and of course saving time, are all key objectives - which makes DBAs and system admins happy, eh?

Examples of How You Could Use Azure Automation

  • Shut down a virtual machine in a development environment on a schedule to avoid charges when it's not being used
  • Pause Azure SQL Data Warehouse on a schedule to avoid compute charges during the time it's not serving queries or processing data loads
  • Check size of an Azure resource to determine if it's close to reaching its threshold for scaling up
  • Scale Azure resources up or down on a predefined schedule
  • Deployment of resources from Dev to Test environment (i.e., in conjunction with ARM deployments)
  • Deployment of test scenarios, particularly if an environment needs to be deleted and recreated numerous times as part of the testing cycle
  • Scheduling of routine processes such as backups, file copy operations, or custom scripts
  • Auditing and logging of all resources in a resource group (optionally with Log Analytics in Operations Management Suite)
  • Kick off a task in response to an Azure alert (requires a webhook-enabled runbook which can respond automatically to alerts)
  • Execute a task on an on-premises server (requires the Hybrid Runbook Worker feature)
  • Deployment of patches for Azure virtual machines

Obviously the above list isn't all-inclusive; there are so many ways in which automation techniques can make your life easier.

The "Lingo" in Azure Automation

  • Automation Account: The container for your runbooks and assets; associated with a Run As account and certificate
  • Runbook: Contains the PowerShell scripts which perform the operation(s)
  • Jobs: History of when runbooks were executed
  • Assets: Objects which are reusable across runbooks, including:
    • Schedules: Schedule to execute a runbook hourly, daily, weekly, or monthly
    • Modules: aka Integration Modules - PowerShell modules uploaded to AA or imported from the Gallery, each of which contains encapsulated functionality
    • Variables: Values for variables which can be called from runbooks
    • Credentials: Encrypted user name and password for authenticating to an external source
    • Certificates: Encrypted .cer or .pfx file for authenticating to an external source
    • Connections: Encrypted info re: connecting to an external source
  • Hybrid Runbook Worker: Allows you to run AA runbooks on your on-premises (non-cloud) resources
  • Webhook: Allows you to execute the runbook from an HTTP request; this allows for a runbook to be dependent upon something else that has occurred

Getting Started With Azure Automation

  • The first thing to do is create an Azure Automation acccount.
  • When you set up an Azure Automation account, you get 4 tutorial runbooks created automatically (though they aren't scheduled). After you've perused and learned from them, you probably want to delete them. 
  • There is a Runbook Gallery which contains a ton of runbooks created by Microsoft or the developer community.
  • Microsoft Press has a free e-book. This e-book was published mid-2015 so some things have definitely changed since it was published - for instance, only PowerShell workflows were supported at the time that e-book was written.

Tips, Suggestions, and Comments About Azure Automation

  • When you create an Azure Automation account, give some thought to the scope of how you intend to use it. Depending on how you have chosen to set up resource groups within your subscription (a *very* important decision!), you might want to align your automation account in a similar way. Do keep in mind that runbooks can't be shared across different automation accounts, so that might lend itself to the automation account being more broad so as not to duplicate scripts. Conversely, owner/contributor/reader/operator roles are all at the automation account level which might justify being more granular.
  • Schedules can be set to run as frequently as once an hour (at the time of this writing, fall 2016).
  • Look into using Assets for sharing connections, credentials, and variable values across runbooks (within the scope of one automation account).
  • Runbooks can be nested; that is, one runbook can call another runbook. This allows for reuse of scripts or other useful patterns. (The runbook being called needs to be in a published status for this to work.)
  • There are 3 statuses: draft, edit, and published. If you have "undeployed" changes in draft or edit status, a published version of your runbook will keep running in its published state.
  • If it's being edited, a runbook's status will be in edit mode. If you wish to make changes to a runbook that's in edit status (i.e., being edited by another contributor), you should wait until it's back safely in published status. Otherwise, two authors at one time will absolutely, positivelyget in each other's way.
  • Design any runbooks, or checkpoint durations, to be less than 3 hours. Currently (fall 2016), any Azure Automation job running 3 hours will be temporarily unloaded. This means it will be restarted from the beginning or the most recent checkpoint (if applicable). (This limit used to be 30 minutes but was increased to 3 hours in 2015.)
  • There are 4 types of runbooks:  PowerShell, PowerShell Workflow, Graphical PowerShell, and Graphical PowerShell Workflow. They can't currently be converted from one type to another. 
  • Azure Automation can be integrated with GitHub for source control. Visual Studio Online (TFS) is coming soon.
  • Every single Azure service isn't supported by Azure Automation, so be sure you verify for your scenario.
  • Runbooks can be created directly in the Azure portal if you like, or they can be imported (.ps1 or .graphrunbook file types). An imported file needs to be <1MB.
  • Be sure to read the chapter called Best Practices in Using Azure Automation as part of the e-book. It's got lots of good tips. In the first edition, it's chapter 6 which starts on page 83.

Desired State Configuration

The Desired State Configuration (DSC) is a specific type of Azure Automation. It utilizes PowerShell in a declarative way to enforce configuration settings or validate a specific state. DSC monitors computers in your environment, which are referred to as target "nodes." The nodes can be in Azure on on-premises, physical or virtual.

Examples of using Desired State Configuration include:

  • Validation that machines comply with corporate security standards
  • Auditing of server state, registry settings, environment variables, etc
  • Validation of server roles and features enabled
  • Management of files, folders, settings, software installations, etc

Finding More Information

Azure Documentation

You Might Also Like...

Setting Up a PC for Cortana Intelligence Suite Development

How to Create a Demo/Test Environment for Azure Data Catalog