I’ve been working with Microsoft business intelligence products since their official inclusion in the product line back in the days of SQL Server 7, around 1998. From that time forward, the business intelligence line of products has seen periods of greater or lesser innovation according to Redmond’s strategic plan. Lately, we’re in an upswing. In late 2019, Microsoft brought forth its latest iteration of its business intelligence tools, Azure Synapse Analytics. With this release, Microsoft further aligned and clarified its capabilities for handling data warehousing, data lakes, data pipelines, and machine learning.
What’s New in Azure Synapse Analytics?
Innovating and improving the core data warehousing engine is always a part of the plan. However, major new features are explicitly designed to take advantage of the flexibility and elasticity of SaaS. For example, you can provision your data warehousing workloads explicitly or you can take advantage of new serverless, on-demand provisioning—meaning that you don’t pay for compute resources when you’re not using the data warehouse. Neither of those options guarantees that you’ll save money compared to purchasing on-premise SQL Server Analytics Services (SSAS). But they give you a dramatically wider range of options, flexibility, and scalability.
Related to my earlier point about the flexibility and potential cost savings of serverless computing, Azure Synapse Analytics improves its competitive position over AWS Redshift in two ways. First, Microsoft has greatly improved the processing power of the product, beating AWS Redshift in a series of public TCP-H benchmarks. Synapse allows you to pause and even stop compute, as need, but Redshift requires compute resources to always be up and running, accumulating fees. Second, by floating compute and storage resources independently, you can get a cost savings compared to AWS Redshift, which requires them to rise or fall in lockstep. With the release of Azure Synapse Analytics, Microsoft furthers that lead by integrating Azure Data Lake Storage (ADLS) options into the data warehouse. Now, you can chose between Azure Blog Storage, Azure Premium Storage, the enhanced service tier known as “Gen2,” and ADSL.
If you are interested in data lakes, Microsoft amplifies your ability with this release by integrating open source Apache Spark (http://spark.apache.org), the industry leader for highly parallelized analytic workloads under both streaming and batch processing models. (Azure Databricks, a commercial implementation, was available earlier in the release cycle.) This integration provides a seamless way to create Spark SQL tables via Azure Data Factory and store in Apache Parquet format without first invoking T-SQL commands such as CREATE EXTERNAL TABLE.
Azure Synapse Studio, the Improved Web UI
Interacting with your data warehouses and data lakes gets easier using the new Azure Synapse Studio, while Azure Data Factory continues to serve as the workhorse data pipeline toolkit. I’ve long talked about how much I like Jupyter Notebooks, and you now get that experience with Azure Synapse Studio. That means you can develop and execute, at present, in Python, Scala, Spark SQL, and T-SQL code blocks. This plasticity also means that you can manage a variety of machine learning workloads. Love it! Note that Azure Synapse Studio, while sharing many similarities, is NOT the same as the desktop-based Azure Data Studio, which I wrote about in my October 2019 column.
Confused Yet?
One thing you may have noticed is that our choices keep getting broader and our options are growing wider. While we now have an even greater amount of flexibility, choice, and capabilities, we also have to figure out exactly what we need and in what amount. The first obvious example is that you now have two distinct UIs, Azure Data Studio and Azure Synapse Studio, that provide you with a notebook experience and handle code from a variety of programming languages. And that’s just a simple UI. It gets even more confusing when you weigh options such as Azure Databricks versus Apache Spark, and whether your choice will run on SQL Server 2019 Big Data Clusters (BDC) or Azure Synapse, and consider a variety of tiers of compute and storage, whether you are licensed by vCores and/or DTUs, and so much more. I fear that customer confusion will get worse, perhaps by a lot, before it gets better.
Learn More!
It’s an exciting time to be working in data. We’re able to provide more value to our organizations than ever before. Taking on the new versions of our core data management toolkits and the capabilities therein will further amplify our ability to exceed the expectations of our stakeholders. Read more about Synapse at https://azure.microsoft.com/en-us/blog/simply-unmatched-truly-limitless-announcing-azure-synapse-analytics and get started right away with a free Azure Synapse account at https://azure.microsoft.com/free/sql-data-warehouse.