Hadoop adoption in the enterprise is growing steadily and with this momentum is an increase in Hadoop-related projects.
From real-time data processing with Apache Spark, to data warehousing with Apache Hive, to applications that run natively across Hadoop clusters via Apache YARN, these next-generation technologies are solving real-world big data challenges today.
DBTA recently held a webcast featuring Rohit Sinha, software engineer at CaskTO, Reiner Kappenberger, head of global product management, enterprise data security at HPE Security - Data Security, and Danil Zburivsky, director of big data and data science at Pythian, who discussed how these technologies work, how real companies are using them, the key challenges, and critical success factors.
Trends in big data applications include the interest in enterprise data lakes, big data analytics, and production data apps, according to Sinha.
From prep to production, users need a Hadoop solution that provides platform integration, lifecycle management, continuous delivery, metrics and logs, multi-tenancy, and security and governance.
Cask’s data app platform can assist with data integration and app development on Hadoop and Spark. The solution takes a unified approach, integrates the latest big data technologies, includes interactive user interfaces, and contains pre-built solutions, Sinha said.
Safeguarding the solution is the second most important part once users have settled on what’s right for them. It can be difficult, however, to secure the data, according to Kappenberger.
With the HPE SecureData platform, enterprises can be rest assured that their Hadoop or data lake is being secured. The platform uses encryption to provide end to end data protection, Kappenberger said.
Hadoop is one of the first choices when it comes to onpremise distributed systems, Zburivsky said. Market penetration is still not significant, most likely because of the cloud. Hadoop is many projects glued together with varying levels of success, he explained.
There is no one stop shop Hadoop vendor, however, users can follow a blueprint to get the best outcomes from their Hadoop systems.
Zburivsky laid out some guidelines:
- Use blueprint architectures especially when it comes to cloud.
- Things are still easy to get wrong
- Be ready to integrate multiple products (both open source and proprietary)
- Looks beyond what your Hadoop vendor provides
- Review architectures often
An archived on-demand replay of this webinar is available here.