Free Online Webinars
Length: 1 Hour
Title: Building Large-Scale, Transactional Data Lakes using Apache Hudi
Time: 2:00 PM - 2:45 PM
Description: Hudi(Hadoop Upserts Deletes and Incrementals) is a storage abstraction library that improves data ingestion. Uber's Nishith Agarwal explains what Hudi offers and why it is needed, including how Hudi can provide ACID semantics to a data lake and some of the basic primitives—such as upsert and delete—that are required to achieve acceptable latencies in ingestion, while also providing high-quality data by enforcing schematization on datasets. Additionally, he discusses more advanced primitives, such as restore, delta-pull, compaction and file sizing required for reliability, efficient storage management, and building incremental ETL pipelines. He reveals how to easily onboard your existing dataset to Hudi format while keeping the same open-source formats so you can start utilizing all the features provided by Hudi without needing to make any drastic changes to your data lake.
Title: Data Virtualization: Modernizing Data Access in Hybrid Environments
Time: 2:45 PM - 3:00 PM
Description: As Data Lakes grow, the traditional and cloud sources in the enterprise have not disappeared. Most companies have a hybrid environment where the data resides across Data Lakes, traditional on-prem sources, and in the cloud. It is also common for Data Lakes to be used for gathering all types of data, as such the quality and consistency of the data at times is questionable. Hybrid systems and data quality make Data Virtualization an essential and critical component for providing and managing data access services for Consumers, Analytics, and Presentation Layers. Data Virtualization ensures trusted and governed data access. Inessa Gerber discusses how Data Virtualization ensures consistency, quality, and governance of all your data across the enterprise, making your Data Lake a key source of quality data. Data Virtualization is an essential component for a modern deployment across Data Lakes and Hybrid environments.