The Elephant is coming to NYC ...

With a flourishing ecosystem and central position in the big data marketplace, Hadoop continues to grow. In a recent poll of DBTAmagazine subscribers, 30% of organizations reported having Hadoop deployed, while 26% indicated they are currently planning to adopt it over the next 12 months. From data offloading to preprocessing, Hadoop is not only enabling the analysis of new data sources, it is changing the value equation of maintaining an active archive of all your data.

Whether your organization is currently considering Hadoop or already using it in production, Hadoop Day is your opportunity to connect with the experts in New York City and advance your knowledge base. This unique educational event has all the bases covered:

Integrating Hadoop into Your IT Environment
Hadoop Cluster Administration
MapReduce Programming
Hadoop Architectures
Hadoop and Your Data Warehouse
Building Hadoop Applications
Using Hadoop for Analytics
Hadoop and the Cloud
Data Security and Hadoop
Harnessing the Hadoop Ecosystem (Spark, YARN, Hive, Pig... the list goes on)

Tuesday, May 12, 2015

CONTINENTAL BREAKFAST

8:00 a.m. - 9:00 a.m.

WELCOME & KEYNOTE - Understanding the Data Value Chain

9:00 a.m. - 9:45 a.m.

Creating value from data requires a new mind-set. It’s hard to escape silos, whether they are technical or conceptual. To exploit fully the opportunity of Big Data tools and architectures, we need a new way of think- ing that frames data as a raw material of business. The answer is to focus not on the functional components— what you do to data—but on business outcomes and how they can be achieved—what you do with data. This new approach can be cultivated through looking at the data value chain.

MODERATOR: Marydee Ojala, Editor, Online Searcher, Computers in Libraries Magazine, & Editor-in-Chief, KMWorld Magazine

Edd Dumbill, VP Strategy, Silicon Valley Data Science

Agile "AppStore" (SBA) Creation on a Rich Search Index

9:45 a.m. - 10:00 a.m.

The bad reputation of enterprise search will change as more powerful technology allows the extension of search to many enterprise data sources. Once enterprises have done the ground work of indexing all or most of their data, they can do things they had never thought of before, such as easily and rapidly developing Search Based Applications (SBAs) to meet user needs. This talk will present several of the SBAs that populate the AstraZeneca "AppStore" on top of the Sinequa platform.

Hans-Josef Jeanrond, VP Marketing, Sinequa

Rob Hernandez, Data Analytics Lead, CTO Office, AstraZeneca

COFFEE BREAK in the Data Solutions Showcase

10:00 a.m. - 10:45 a.m.

H101: The Current State of Hadoop

10:45 a.m. - 11:45 a.m.

Apache Hadoop has become the predominant Big Data platform for storing and analyzing data. Companies use Hadoop to get value and gain competitive differentiation from their ever-increasing wealth of data. Knowing where and how to start exploring Hadoop's rich set of tools is a “Big Data” challenge of its own. Learn the key differences between the most popular Hadoop distributions so you can start using Hadoop today.

The Hadoop Ecosystem

James Casaletto, Principal Solutions Architect, Professional Services, MapR

Hadoop: Whose to Choose

David Teplow, Founder & CEO, Integra Technology Consulting

H102: Hadoop and Your Data Warehouse

12:00 p.m. - 12:45 p.m.

Elliott Cordo shares real-world insights across a range of topics, including the evolving best practices for building a data warehouse on Hadoop that also coexists with multiple processing frameworks and additional non-Hadoop storage platforms, the place for massively parallel-processing and relational databases in analytic architectures, and the ways in which the cloud offers the ability to quickly and cost-effectively establish a scalable platform for your Big Data warehouse.

Building a Real-World Data Warehouse

Elliott Cordo, CEO/Founder/Builder, Data Futures, LLC

Snowflake and Data Warehouses

Greg Rahn, Director of Product Management, Snowflake Computing

ATTENDEE LUNCH in the Data Solutions Showcase

12:45 p.m. - 2:00 p.m.

H103: Hadoop in the Cloud

2:00 p.m. - 2:45 p.m.

To get your Big Data job done right, you need to use the right Big Data tools. How can you make sure you are leveraging the right tools? Learn from Ben Sgro about how Simulmedia, a pioneer in audience-based advertising on TV, is using a custom Python framework to programmatically create EMR clusters, move data to and from Amazon Simple Storage Service, and load data into its Redshift data warehouse. Xplenty’s Yaniv Mor talks about how using Hadoop in a coding-free, cloud-based environment ensures that businesses can benefit from Big Data without having to invest in hardware, software, or related personnel.

Python and EMR for MapReduce ETLs in the Cloud

Ben Sgro, Director of Data Engineering, Simulmedia

Offloading Data Integration/ETL to the Cloud (Using Hadoop)

Yaniv Mor, Founder & CEO, Xplenty

COFFEE BREAK in the Data Solutions Showcase

2:45 p.m. - 3:15 p.m.

H104: Harnessing the Hadoop Ecosystem

3:15 p.m. - 4:00 p.m.

Big Data is transforming how companies analyze information and enabling them to connect with customers in ways never possible before. Radius, which provides companies with a real-time marketing intelligence platform, is moving its core infrastructure from Hadoop to Spark. Hear Spotright’s Nathan Halko talk about his experiences moving from Hadoop to Spark. Qubole’s Jason Huang provides an overview of Apache Hive, the key differences between Hive and traditional data warehouses built on top of RDBMSs, and key techniques to increase performance and simplify Hive.

Moving From Hadoop to Spark: The Business Case

Nathan P Halko, Data Scientist, Spotright

Deep Dive Into Apache Hive

Jason Huang, Senior Solutions Architect, Qubole

H105: Panel Discussion: The Data Lake: From Hype to Reality

4:15 p.m. - 5:00 p.m.

There has been a lot of hype around data lakes and their relevance to Big Data challenges. The data lake approach is being championed by some as a way to realize the promise of Big Data, allowing organizations to move data in its raw form into a central storage reservoir until it is needed. There has also been much scrutiny in the marketplace over the potential pitfalls of data lakes. To find out what you need to know before you dive into the data lake, join Venkat Eswara of GE, Joe Caserta of Caserta Concepts, and George Coregedo of RedPoint Global for a lively panel discussion about using Hadoop to create a centralized processing pool where data is captured, cleansed, linked, and structured in a consistent way.

Joe Caserta, Founding President, Caserta

George Corugedo, CTO, RedPoint Global Inc.