A data warehouse is a central repository of corporate data derived from operational systems and external data sources. The main function of a data warehouse is to support strategic business decisions by enabling data analysis and reporting at aggregate levels. In this sense, a data warehouse is a central component of business intelligence.
Examples of things your company might do in a data warehouse include:
- Analyzing sales data across many products.
- Measuring and reporting on financial performance across all products/services or lines of business.
Cloud Data Warehouse
One of the most important shifts in data warehousing in recent times has been the emergence of the cloud data warehouse. Previously, setting up a data warehouse required a huge investment in IT resources to build and manage a specially designed on-premise data center. Now, several cloud computing vendors offer data warehousing functions as a service (DWaaS), accessible via an Internet connection. This model negates the costly capital expenditure and management required for an on-premise data warehouse.
An important result of cloud data warehousing is that there are new data warehouse concepts that you need to understand. As a start, this post comparing data warehouse concepts in the cloud to traditional on-premise data centers can help familiarize you with the newer terminology.
The availability of cloud data warehouses makes data warehousing much more accessible to a wider range of companies. However, before you go rushing into choosing a vendor and getting set up, first understand that managing a data warehouse in the cloud presents a whole new set of challenges, regardless of whether you’ve managed an on-premise setup before. Five of the main challenges and some recommended solutions are outlined below to help you better prepare for managing a data warehouse in the cloud.
5 Challenges of Cloud-Based Data Warehouses
Different Cost Structure
The cost structure for cloud data warehouses is much different than how you’d pay for standard servers in an on-premise data center. By leveraging the cloud, you switch to a rental model, in which you need to efficiently select the right infrastructure for your cloud data warehouse workloads.
Even though the cloud is a more cost-efficient option for most businesses in need of data warehousing capabilities, it can still become extremely expensive, and costs can spiral out of control. According to the 2018 RightScale State of The Cloud report, enterprise cloud users are wasting 35% of their cloud spend.
To overcome potential issues with operating on a different cost structure, it’s a good idea to closely monitor cloud usage and scale down underutilized resources. A cloud management platform can take some of the complexity out of keeping track of expenses—cost transparency and optimization are important features of such platforms.Skill Shortages
Many companies cite a lack of resources and expertise as barriers to adopting cloud-based services; a data warehouse is one of the more complex systems to implement in the cloud and it is not possible without the right skills.
To run a successful cloud data warehouse, companies need people with knowledge of cloud architecture, a technical understanding of integrating data into cloud systems, in addition to an in-depth understanding of cloud security and governance.
Cloud expertise is expensive, however, it will be a worthwhile investment to define new roles and hire new staff to fill such roles. At a minimum, you can enlist the help of a cloud architect, who should already have data center management and architecting experience. The cloud architect should also possess expert knowledge about the services of your main DWaaS provider.
Cloud Security & Governance
The security of data in the cloud is imperative, particularly when you consider the business-critical nature of a data warehouse. Even though security is an obvious concern, most cloud providers have watertight security features that are difficult for attackers to penetrate.
The most pressing aspect of cloud security and governance going forward is the potential for customers to cause security failures. In fact, Gartner predicted that by 2020, 95% of cloud security failures will be the customer's fault.
The main step an enterprise can take to meet this challenge is to recognize its responsibility for the secure use of the public cloud. Your company can draw up a formal cloud security and governance plan, which sets out how to use the cloud securely and in compliance with relevant industry regulations.
Cloud access security broker (CASB) software looks set to grow in popularity, and it makes for a good choice when you move to a cloud-based data warehouse. Such software sits between cloud service users and cloud applications, monitoring cloud usage and enforcing security policies in the process.
Data Movement
The initial movement of huge petabyte-scale datasets into the cloud might strike you as the main problem in terms of getting set up with a cloud data warehouse, however, this part is actually quite straightforward. Physical media, such as the Amazon Snowball data transport solution, is a cost-effective option for once-off massive transfers of data from your company’s location to the cloud provider’s systems.
The real difficulty emerges when trying to incorporate data movement to and from the cloud seamlessly into your company’s enterprise data flow. As enterprises gather more data, this information needs to be extracted from operational databases, transformed, and loaded into the cloud-native data warehouse, in what is known in data science parlance as ETL.
ETL pipelines need to be built to work with a cloud-native data warehouse. Cloud architects and developers can help overcome the challenge of data movement. Bear in mind also that many cloud users want to move subsets of data from the cloud data warehouse back to on-premises data centers and servers. Most public cloud providers charge for data egress, and the costs can quickly skyrocket.
Lack of Standardization
The DWaaS model, like many types of cloud service, is still in its infancy and is still an emerging technology. Unfortunately for prospective customers, emerging technology markets lack standardization, which can make your choice of cloud vendor confusing.
Different vendors use different cost models; some employ a flat monthly charge while others opt for a pay-as-you-go model. Furthermore, cloud vendors differ in the level of operational support they provide. Some data warehouse options require you to provision the resources and manage everything while others use a serverless approach wherein the task of managing and provisioning cloud servers is abstracted away from the customer.
The main challenge you need to face is finding a cloud provider that best meets your needs in terms of budget, integration with existing processes, and SLA. The lack of standardization in cloud services shouldn’t be a barrier to managing a data warehouse in the cloud; it just means that you need to perform sufficient due diligence.
Drawing up a due diligence checklist for cloud computing, broken down into sub-categories such as terms of service, security, and data backup, can go a long way towards making sure you’ve covered all angles when researching a cloud data warehouse provider.
Weighing the Challenges Vs. Benefits
There are evidently some major challenges to managing a data warehouse in the cloud, however, by understanding those challenges and how to solve them, you can best prepare for a migration to the cloud. The affordability, scalability, and computing power of cloud data warehouses all make these challenges worth overcoming.