In my last article, I told you a bit about Microsoft’s overall big data strategy in SQL Server 2014 (which has just become available to customers) with a deep dive into their on-premises partnership with Hortonworks. (Online at www.dbta.com/Columns/SQL-Server-Drill-Down/Microsoft-SQL-Server-and-the-Big-Data-Play-94721.aspx). Now, it’s time to follow up with a discussion of Microsoft’s cloud-centric approach to big data using HDInsight.
Microsoft SQL Server and Big Data in the Cloud
HDInsight is a 100% Apache Hadoop service, currently at v2.2, available through Microsoft’s Windows Azure cloud offerings. HDInsight makes all of the standard features you’d expect from a Hadoop implementation—the HDFS/MapReduce software framework and related projects such as Pig and Hive—available in a simple, scalable, and fairly cheap cloud environment. Behind the scenes, HDInsight uses Windows Azure Blob storage to manage and store data as its default file system.
The Blob storage component is optimized to store data, while Microsoft has continued to use HDFS for processing computations on that data. From there, HDInsight coordinates data processing across the Hadoop clusters for running MapReduce jobs on the data. Despite the use of both HDFS and Windows Azure Blob storage all components in the Hadoop ecosystem can operate directly on the data it manages, by default.
HDInsight provides all of the benefits of the cloud: full elasticity, strong security through dedicated Azure Secure Nodes, and easy management through PowerShell scripting. There’s also an extensive library of Powershell scripts available to easy deployment and provision for your Hadoop clusters.
HDInsight is easy to query for end users through Microsoft’s standard suite of BI tools, such as PowerPivot, Power View and Power Query. (“I detect a branding pattern,” says Captain Obvious.) If you haven’t already taken a look, I encourage you to check out a demo of Power BI on Office 365, that is the cloud-enabled version of the product, at www.microsoft.com/en-us/powerBI/solutions/demo/default.aspx#fbid=X40wKqyIRti.
You have full support for SQL Server services, such as SQL Server Analysis Services, and SQL Server Reporting Service—and can also access HDInsight data using T-SQL and PolyBase, a feature of SQL Server 2012 Parallel Data Warehouse. (Read more about PolyBase at www.microsoft.com/en-us/sqlserver/solutions-?technologies/data-warehousing/polybase.aspx.)
Developers can program against HDInsight using a variety of language choices, ranging from Java, Microsoft .NET, to a variety of other languages. And, developers can query HDInsight from their apps using LINQ to Hive, as well as use standard BI tools mentioned earlier, such as PowerPivot, Power View, Power Query, Power Map, and Excel 2013.
Taking the Next Step with HDInsight
If you’re like me in that you read the assembly instructions for your kids’ toys in full before beginning the project (it drives my kids nuts), start by reading the HDInsight documentation at http://technet.microsoft.com/en-us/library/dn247618.aspx. If you’re a hands-on learner, take advantage of the one-month free trial available at www.windowsazure.com/en-us/pricing/free-trial. (Startups get an incredibly good deal on licensing costs through the BizSpark program at www.windowsazure.com/en-us/offers/ms-azr-0057p.) And, for the full gamut of getting-started support, such as tutorials and walk-throughs, go to www.windowsazure.com/en-us/documentation/articles/hdinsight-get-started.
For Azure-centric documentation, check out www.windowsazure.com/en-us/documentation/services/hdinsight. Pricing details are published at www.windowsazure.com/en-us/pricing/details/hdinsight, along with additional details about support and SLA options, cost calculators, and geographic availability. (Separate Windows Azure pricing details are online at www.windowsazure.com/en-us/pricing/purchase-options.)