With very few exceptions, today’s IT modernization strategies include moving workloads from traditional data centers into the cloud. In fact, a recent survey on cloud adoption by LogicMonitor indicates enterprises will run 83% of their workloads this way. Topping the list of business initiatives that are driving this trend are digital transformation followed by IT agility.
The shift to cloud eliminates the challenges of running a physical data center. However, organizations still continue to face the software deployment management challenge in terms of ensuring that their applications are always running and running in an optimal manner (i.e., they consume the right level of resources).
Cloud service providers have long understood the key to providing these shared compute resources is to be able to have massive resource pools that are then virtualized or containerized into smaller, multi-tenant storage or compute units which are then charged out. But, while the abstraction of the storage and compute aspects has been achieved, the management of applications all the way from the operating system (OS) up, and taking care of variations at each layer, has remained a major challenge.
Over the past 5 years, we have seen the emergence of new technologies such as Docker that enabled organizations to bring the same abstraction and scaling to the applications side. In short, Docker provides a way for a software developer to put the entire software stack into a container, i.e., a Docker image. The developer then tests and ships their software as one or more Docker images knowing that it can be deployed anywhere using Docker run-time, and it will work exactly as they intended and tested it. The developer now controls the entire software deployment stack without having to cater to every nuance in the target environment.
Most enterprise software these days consists of multiple smaller components or services. Gone are the days of large monolithic software. When software consists of multiple components, Docker Compose provides a way for a developer to specify how these services should coordinate. But making sure that these individual processes continue to run and continue to coordinate as originally designed continues to be a challenge at many organizations.
And then there is the need for scaling the applications up and down in response to usage or demand and making sure that both the user experience is pleasant and at the same time it is not achieved by over-paying for infrastructure than what is absolutely necessary. And this requires some form of dynamic scaling or what is often referred to as auto-scaling since load or demand is never constant.
In the world of data preparation, this is of critical importance. Not only are workloads complex with engines such as Spark, data ingest pipelines, data profiling, and storage but it also often requires expanding the compute power to manage all these processes and then contract again when the jobs are completed.
The emergence of Kubernetes (K8s), originally developed by Google, is now giving organizations the means to manage the deployment of these Docker containers and their scheduling as well as the ability to restart and scale out when needed.
It is the combination of Docker and Kubernetes that is creating a tipping point that will accelerate the industry toward a serverless world capable of powering business agility, lowering administration and management, and disrupting costs. The tipping point is caused by three factors:
- With Kubernetes, one does not have to actively babysit their applications and make sure they are running, scaling and coordinating. That heavy lifting is done by the framework. This changes traditional Ops in a fundamental way.
- The framework is available on all major cloud providers. Some of them, such as Azure, have made their K8s offering (i.e., control plane) free. This means you only have to pay for the VMs (host nodes) you consume. The orchestrator is available at no cost.
- Low cost of entry. If you can ship your application as a Docker image, then you are just a few steps away from making it available on K8s.
This is of vital importance in the world of analytics, data science, and preparing the data for these critical business initiatives holds tremendous potential for all parties including:
- Supply-side: On the supply side, software developers are in a unique position to lower their cost by developing and supporting applications built in Docker containers. This approach is similar to large retail chains like Safeway or Walmart that demand suppliers not only deliver their products, but also take care of how it is displayed and put into shelves inside the retailer. Running complex applications like SPARK and analytical engines are now easier to compose into a single isolated container that can run anywhere. Interestingly, in many of our conversations with customers, they are beginning to set standards to only work with software that are in Docker containers.
- Demand side: With Kubernetes as the orchestrator to deploy and manage these containers across very complex, hybrid cloud environments, customers now get the benefit of improved agility to deploy technologies in support of business initiatives. In addition, Kubernetes will enable auto scaling of workloads and ensure that these resources are available with high performance and availability. Today, several large sophisticated IT Buyers are hesitating to even license applications that would not run on their K8s clusters. The buyer checklist in many evaluations now includes the question “do you ship a deployment yaml file for your application along with the necessary images?" Customers want their applications to keep running as they expect in a lights-out scenario.
While the combination of Kubernetes and Docker deliver some great advances for software development and deployment in cloud environments, it is increasingly considered in on-premise settings. In fact, the LogicMonitor study indicated that by 2020 some 27% of workloads will remain on-premise.
This may be because organizations that have vast data center resources and understand that these on-premise workloads are not going to go away quickly. In many countries and industries, security concerns remain a barrier to cloud entry. Adoption of Kubernetes orchestration layers on-premise promises to offer some of the same benefits that can be achieved in the cloud. And it opens the door for much easier shifting of workloads between on-premise and cloud should the need or opportunity arise.