Streaming data was looked at from two perspectives in a session, titled “The Streaming Future of Big Data," at Data Summit 2017, taking place at the New York Hilton Midtown, May 16-17, 2017.
The session was part of the Hadoop Day track, which was moderated by Unisphere analyst Joe McKendrick.
In a presentation titled, “Streaming of Big Data Over the Decades,” Roger C. Rea — IBM Streams Product Manager, IBM Watson and Cloud Platform, looked at the current state of streaming and what’s ahead. Historical data finding analyzed persisted data, had a batch philosophy, a pull approach and was on demand. Streaming analytics, on the other hand, analyzes the current moment “now,” analyzes data directly in motion, analyzes data at the speed it was created, has a push approach, and provides continuous insights.
Rea also offered seven of his own predictions for streaming in the near future:
- One technology becomes winner take all (2% odds)
- Half the current vendors/technologies die in 5 years (20% odds)
- Half the current vendors/technologies die in 10 years (50% odds)
- Apache Beam becomes a uniting development API (40% odds)
- Apache Calcite becomes a uniting development API (20% odds)
- Data Volumes, Varieties and Velocities will continue to grow (100% odds)
- Streaming Analytics outpaces traditional Hadoop/Spark market (70% odds)
Diving into streaming technologies and the related use of containers and microservices, Paul Curtis, senior field enablement engineer at MapR, also offered a presentation on “Event-Driven Microservices with Streams & Docker.”
This presentation looked at how to build a multiple location, event-driven architecture that uses streaming data to interconnect Docker-hosted microservices for scalable and high availability services across multiple data centers.
Streams as the data transport operate at web-scale, are reliable and redundant, offer global reach with the ability to transport any data from many locations to many locations, offer a small footprint, and are enabled with easy development and deployment by containers.
Looking at the attributes of the combination of technologies, Curtis said that containers are flexible, with deployment strategies to meet business needs, and offer ease of development with the ability to deploy globally using a single node cluster and Docker on a laptop. Streams are the “plumbing,” with well-known APIs, the ability to architect deployment for reliability and redundancy required by business goals. Together, Curtis said, containers and streams make test and development simple with the ability to add service becoming nothing more than adding new container applications.
Many conference presentations have been made available by speakers at www.dbta.com/datasummit/2017/presentations.aspx.