When it comes to implementing a big data strategy in a Microsoft SQL Server shop, you’re generally going to consider one of three strategies. In the first strategy, which I won’t be addressing today, you might choose to tackle your big data architecture by pushing your relational database to its scalability limits, perhaps by making greater use of special features such as SQL Server’s columnstore indexes and special optimizations for BI applications.
The other two strategies will be my topic today, focusing on either on-premise big data or cloud/hybrid big data. In this article, I’m going to introduce big data in the cloud, but focus more attention towards on-premise big data. Look for a deeper dive into cloud-based big data architecture in the next article.
Microsoft SQL Server and Big Data in the Cloud
SQL Server 2012, and even more so in the upcoming SQL Server 2014 release, has built out a very strong Apache Hadoop infrastructure on Windows Azure called HDInsight. HDInsight is a 100% Apache Hadoop solution enabling all of the value and capability of Hadoop to manage and process enormous data sets, whether structured or unstructured. HDInsight on Windows Azure offers fully elastic scalability, strong security, and easy management through PowerShell scripting. You can query HDInsight with LINQ and Hive, as well as use standard tools such as PowerPivot, Power View, Power Query, and Excel 2013.
Despite all of that goodness, it is in the cloud—and a lot of people aren’t ready to go there yet.
Microsoft + Hortonworks = Big Data On-Premise
When an on-premises big data approach is the right architectural choice for you, your first stop should be with Hortonworks, Microsoft’s partner of choice for enterprise Hadoop implementations. Hortonworks Data Platform (HDP) integrates all of the latest releases of Apache Hadoop along with a great many additional features, capabilities, and tools to provide even better management, data processing, and core operations than what’s available with other Hadoop releases.
I spent some time recently chatting with Jim Walker, director of product marketing at Hortonworks, on Twitter at @jaymce. “Adoption has been brisk in the last couple years,” Walker tells me, “but 2014 is looking to be the inflection point, that is, the elbow in the hockey-stick graph, for broader enterprise adoption. It’s going to be a very big year.” Microsoft and Hortonworks have one of the strongest and most vibrant partner relationships I’ve seen in the ISV space (and as an ISV myself, I’ve seen many). The engineering teams from the two companies work together on a daily basis and, in fact, Hortonworks has a significant staff on-site in Redmond.
One of my top concerns in implementing significant new technologies, as an enterprise architect, is how to ensure a successful rollout of the new technology when my current staff is incredibly busy supporting our current infrastructure.
Hortonworks deals with the rollout of new technology in three ways based on the role of the staff member:
- Development—HDP has perhaps the best-in-class integration with the .NET development framework. HDP also has strong hooks for Spring, a popular Java framework.
- Analysts—HDP will be familiar to anyone who knows SQL via Apache Hive using the Stinger toolkit. Walker tells me that HDP has added a variety of enhancements and SQL semantics so that “analysts can run interactive queries against petabytes of data returning results in 10 seconds or less” directly against the HDP backend without having to learn a new query language.
- Operations—Managing hundreds of nodes is tough! HDP works with Microsoft System Center or other similar products such as Teradata Viewpoint. In addition, HDP integrates well with major data integration tools such as SQL Server Integration Services, Informatica, and the open source Talend product.
The short story is, if you’re doing development, analysis, or operations using ubiquitous Microsoft tools today, then HDP will be a quick and easy platform to add to your shop.
Taking the Next Step With Hortonworks
If you’re the more experimental type, you can get right to the evaluation by downloading the Hortonworks sandbox (Hortonworks.com/sandbox), a single node VM includes everything in the stack plus lots of technical tutorials, such as showing how to create a table using Hive, and business case tutorials, such as showing how to analyze clickstream data. Hortonworks also offers a surprisingly large number of online classes you can attend or watch as streaming media. It’s easy to get started!
Kevin Kline, a longtime Microsoft SQL Server MVP, is a founder and former president of PASS and the author of SQL in a Nutshell. Kline tweets at @kekline and blogs at http://kevinekline.com.
Follow Kevin Kline on Twitter and Google.