The evolution of computing has changed dramatically since the mainframe was first introduced. Thanks to a proliferation of options for handling Big Data more naturally and efficiently than relational database management systems (RDBMS), we are in a “post-relational era.”
David Teplow, CEO, Integra Technology Consulting, presented his session, “SQL’s Sequel: Hadoop and the Post-Relational Revolution” on Tuesday, May 22, 2018 during Data Summit 2018. His session deep dived into the best solutions to utilize during this “era.”
Data Summit 2018 is taking place at the Hyatt Regency Boston, May 22-23, with pre-conference workshops on Monday, May 21. Cognitive Computing Summit will also be co-located at the event.
Teplow explained that Hadoop was born from two published pieces outlining the Google File system and MapReduce. From this massively parallel processing (MPP) architectures were born.
MPP systems can be classified by structure and by feature. Characteristics include:
- Relational
- Key-Value
- Wide-Column
- Document
- Consistency
- Availability
- Partition Tolerance
Relational MPP Systems are best for full SQL and ACID support whereas Key Value MPP Systems are best for fast-changing data and high con-currency.
Wide Column MPP Systems are best for extreme write volumes and Document MPP Systems are best for slow-changing, read-mostly systems, Teplow said.
Where there’s choice, however, Teplow explained that the CAP Theorem states that you can only have two of the best features.
The CAP Theorem was proposed and presented by Eric Brewer (Professor at UC Berkeley / Co-Founder & Chief Scientist at Inktomi) at the 2000 Symposium on Principles of Distributed Computing (PODC), then proven by Seth Gilbert and Nancy Lynch of MIT.
According to the Theorem, only two of the following attributes can be present in your solution:
- Consistency - each client always has the same view of the data
- Availability - all clients can always read and write
- Partition Tolerance - the system works well across physical network partitions
“It’s important to understand the differences between the many options and the tradeoffs that come with each,” Teplow said.
Data Summit 2019, presented by DBTA and Big Data Quarterly, is tentatively scheduled for May 21-22, 2019, at the Hyatt Regency Boston with pre-conference workshops on May 20.
Many presentations from Data Summit 2018 have been made available for review at https://www.dbta.com/DataSummit/2018/Presentations.aspx.