As Hadoop adoption in the enterprise continues to grow, so does commitment to the data lake strategy.
DBTA recently held a webinar with Mark Van de Wiel, CTO, HVR, Dale Kim, Sr. director, products/solutions, Arcadia Data, and Rick Golba, product marketing manager, Percona, who discussed unlocking the power of the data lake.
Data lakes organize large, diverse sets of data, enable access to data with minimal latency, store data in its raw, detailed state, supports multiple use cases and architectures, Van de Wiel explained.
The three challenges of adopting a data lake include continuous fee, security, and trusting the data. To overcome these issues, Van de wiel recommended:
- Continuous feed: Log-Based CDC
- Security: Encryption - Certificates
- Trust the data: Data Compare
Because data is so widespread in enterprises, Kim suggests having creating two separate BI standards: data warehouse BI and data lake BI.
BI built for data warehouses fails in data lakes because there’s inefficient scale, it cannot handle diversity, and is agile only in name.
BI for data lakes must be architected for scale and performance, Kim said. Native BI unleashes the power and flexibility of the data lake by:
- Scaling without compromise
- Enabling real-time, streaming analytics
- Unlocking complex data not easily reachable before
- Acting directly from your data discovery
- Optimizing and productionizing based on usage and need
Successful attributes of a data lake include data movement, data storage, provides analytic options, and enables machine learning, Golba said.
Hosting a data lake in the cloud is another option and can provide the following benefits:
- Low cost storage enables large volumes of data to be stored
- Different storage options for different access needs
- Scalability is built in
- Access is easily made available to authorized users
- Flexibility of the cloud permits new technologies to be spun up and spun down to try out different applications
An archived on-demand replay of this webinar is available here.