To fit into modern analytics ecosystems, legacy data warehouses must evolve—both architecturally and technologically—to deliver the agility, scalability, and flexibility that business need to thrive in today's data-driven economy.
Alongside new architectural approaches, a variety of technologies have emerged as key ingredients of modern data warehousing, from data virtualization and cloud services, to Hadoop and Spark, and machine learning and automation.
DBTA recently held a webinar with Clive Bearman, director of product marketing, Qlik; Keith Lambert, VP, marketing and business development, Kore Technologies; and Brian Bulkowski, CTO, Yellowbrick, who discussed the must-have capabilities for modern data warehousing today—how they work and how best to use them.
According to Bulkowski, an enterprise data warehouse should be always on and available, have ad-hoc SQL, offer correct answers on any schema, can process terabytes to petabytes of data, provide mixed real-time inserts, ETL, Batch, and interactive workloads, and support thousands of concurrent users.
Enterprises choose data warehousing solutions to discover data-driven opportunities, Bearman said. It’s time to rework the data warehouse architecture format. Must-haves include:
- Real-time updates: Architected for realtime changed data capture and analytics ready data delivery
- Universal data: All types, sizes and velocities of enterprise data
- Automation everywhere: End-to-end automation improves productivity and responsiveness
- Self-service data marts: Curated, fit-for-purpose data
- Smart catalog: Expand knowledge of usage, lineage, confidence and trust
Bulkoswki suggested companies use a hybrid cloud data warehouse, which will offer several benefits including the ability to easily shift locations, have an agile versus cost tradeoff, and provide a single database for secure datasets.
According to Lambert, enterprises should start off with best practices and standards. Businesses should define a data model and naming standards, create a data flow diagram, build a source agnostic integration layer, adopt a data warehouse architecture standard, and consider an agile data warehouse methodology.
Enterprises should invest in multi-source and database aggregation, point-in-time snapshots, incremental database updates, automation with message-based architecture, detailed transaction logging, the ability to analyze and profile data sources, template-driven ETL software, easy to change development environment and tools, and a technology partner with reliable and flexible software, Lambert said.
Lambert proposed using Kourier Integrator, which is a multi-purpose solution. The platform provides:
- Near Real-Time / CDC
- Multi-Source / DBMS
- Point in Time
- SQL Automation
- Real-Time via REST
- API Subscriptions
- Inbound/Outbound
- Asynchronous/Batch
- ERP Adaptors
- Storefront/Portal
- Marketplace
- EDI
- Data Sharing
- Data Archiving
- Data Migration
- Data Cleansing
An archived on-demand replay of this webinar is available here.