The State of Stateful v. Stateless Kubernetes

Matt Bator

11 months ago

I’m in a state of disbelief (pun intended) when, in 2024, I often hear platform operators talk about Kubernetes being for stateless workloads only — or that they don’t require data protection as “all of our apps are stateless.” In this article, I’ll provide background on where states exist in our applications, the rise of the “Kubernetes = stateless” myth, and discuss the needs for data protection across all Kubernetes environments.

Stateful vs. Stateless — The Basics

Stateless refers to a workload that does not require stateful persistence, such that restarting or redeploying the workload means any data collected or altered settings are gone. Compared with their stateful virtual machine (VM) predecessors, containers are inherently stateless. By restarting a container, a user knows the workload is running the same intended workload as when the container image was initially created.

Following the logic, a stateful workload is one that preserves data such that it can be accessed across restarts. Common stateful workloads include databases like PostgreSQL, MySQL, or MongoDB.

So, then, what is a stateless application?

Applications typically consist of multiple workloads. Even monolithic, VM-based applications generally consist of separate frontend, business logic, and database workloads. Cloud-native, microservice-based applications will typically consist of many more separate workloads, each with its own specific function — with the goal of enabling organizations to ship new features and services faster than ever before.

This means that, overwhelmingly, stateless applications do not exist.

Your favorite food ordering app, a “to do” app on your phone, a form you fill out to book a doctor’s appointment — all of these would be effectively useless if there weren’t some stateful workloads responsible for persisting the critical data associated with each of these use cases.

With Kubernetes, it’s critical to understand where your application data, or state, resides — inside or outside the cluster.

How We Got Here — Kubernetes = Stateless?

As Docker became a de facto container platform in the early to mid-2010s for packaging software, having a solution to orchestrate groups of containers at scale was critical to extract the full operational value from a shift to microservices — and Kubernetes delivered. Declarative deployments, high availability, and automated scaling were significant benefits for stateless workloads that could be stopped, restarted, and cloned with ease. These workloads, often providing more front end or application-server compute, could be connected to data services running outside of the cluster. The solution was hailed by many as a success. Kubernetes was clearly a solution designed for stateless workloads, right?

Yet, all the way back in Kubernetes v1.0, alongside all these great capabilities for stateless workloads, sat Persistent Volumes — a Kubernetes-native mechanism for attaching persistent storage to ephemeral pods. Clearly, the intention to support stateful workloads was a consideration from the beginning, with several notable enhancements along the way:

Dec 2017 (v1.9) General availability of StatefulSets — providing persistence of pod identity traits and ordered operations
Dec 2018 (v1.13) General availability of Container Storage Interface (CSI) — a standard allowing any storage manufacturer to create their own plugins to provide storage to Kubernetes workloads
May 2018 Introduction of the Operator Framework — a Software Development Kit (SDK) to simplify the deployment and management of advanced workloads, like databases, on Kubernetes
December 2020 (v1.20) General availability of Volume Snapshots within the CSI specification — providing a standardized interface for performing storage snapshot operations within Kubernetes

These improvements, combined with efforts of many different vendors and project contributors, have created a rich ecosystem providing users with the freedom to choose which data services will provide their application state and where those workloads will run.

The Rise of GitOps

During this period, the Kubernetes community saw the rise of yet another idea to improve container orchestration — GitOps. The concept is simple: to use source control as the source of truth, storing everything required to deploy an application, including code and Kubernetes resources. Every time code is merged into the repository to make a change, a controller updates the deployment to reflect the desired state described in Git. GitOps implementations can provide a mechanism for change control, the ability to redeploy a whole environment with a single click, and a way to revert bad changes — sometimes.

Just because Kubernetes can run stateful workloads, does that mean that you should?

Tasked with building a proof-of-concept application, a developer may often opt for a cloud-hosted, managed database, or DBaaS, to host their persistent data outside of the Kubernetes cluster. DBaaS solutions provide developer-friendly APIs and quick time to value to users regardless of their level of database administration expertise. However, as is often true, what’s easiest isn’t always best. Let’s explore the reasons to consider running stateful workloads inside vs. outside of the cluster:

Latency — Co-locating data services alongside other containerized workloads helps to ensure low-latency connectivity to your data to deliver consistent user experience.
Mobility — Kubernetes provides a strong layer of abstraction allowing you to move the same workload between different clouds. This allows organizations to address data sovereignty needs or simply to take advantage of better pricing terms. However, if your data services are tied to a cloud-specific DBaaS, migrating between environments becomes significantly more complicated. Automations and policies created for one DBaaS will need to be re-created for the equivalent managed database in your target cloud, presuming it exists and meets requirements. Separate tooling and processing will be required to redeploy your Kubernetes workloads from your external data services.
Copy Data Management — Your developers need to test your application with recent and relevant data, requiring separate processes to manage and connect these data services when it is run outside of the cluster. As part of a fully declarative solution that runs both stateless and stateful workloads within the cluster, users could simply restore complete application backups as clones using a Kubernetes-native interface like ‘kubectl.’
Polyglot Persistence — A fancy way of saying “the right data service for the right use case.” As legacy applications are re-built using a microservices architecture, developers have a choice on Kubernetes to implement the ideal data service for each need, with simplified deployment and on-going management addressed by operators. Organizations operating data services outside of Kubernetes may not be able to scale to meet the complex operational requirements to support each of the ideal data service selections made by developers.
Cost — With DBaaS solutions, you pay for convenience. But partners like EnterpriseDB have shown how hosting your own data services on Kubernetes yields lower total cost of ownership (TCO) without making significant user experience sacrifices.

Opting to consolidate stateless and stateful workloads on Kubernetes can increase the flexibility of where applications can run. It can also provide additional self-service for DevOps, lower costs, and streamline tooling and process. This makes it possible to apply GitOps methodology to all parts of an application. It should be no surprise then, according to Datadog’s most recent report on real-world container use, that databases continue to be the most popular category of containerized workloads.

Keeping Your Kubernetes Data Protected

Team Stateful

If you’re already convinced that stateful workloads on Kubernetes is the way forward, you likely already understand that providing backup and disaster recovery for these applications requires a purpose-built tool. Each application is comprised of many different Kubernetes resources (ConfigMaps, Secrets, Deployments, etc.) alongside persistent volume data. What’s more, all of these need to be properly discovered, snapshotted, and exported to a safe location off-cluster.

Enter Veeam Kasten for Kubernetes, a Kubernetes-native data protection and application mobility solution. Kasten provides organizations with an easy-to-use, scalable, and secure way to create immutable backups and recover entire applications quickly and reliably.

As Kasten provides a declarative, Kubernetes-native API for defining policies, performing actions, and more, data protection can be tightly integrated into any GitOps implementation. This is true across a spectrum of use cases, from the deployment and configuration of Kasten on a new cluster, to enhancing GitOps by automating backups before rolling out code updates. By itself, GitOps provides the ability to roll back configuration changes to a Kubernetes resource; but, if one of those bad changes alters data on a persistent volume (e.g., an incorrect schema change or dropping a table), you’ll want a recent backup that can be rapidly restored.

Team Stateless

Still not ready to trade in your favorite DBaaS for a shiny new Kubernetes-hosted, operator-managed database? Kasten still has your back. Let’s look at why Kubernetes-native data protection is still necessary:

State must live somewhere — Wherever data may reside outside of the cluster, it remains critical to be able to reproduce point-in-time deployments of applications, typically for DR or compliance reasons. At its core, Kasten is an orchestration engine designed for data protection operations. Kasten’s blueprint capabilities allow for robust controls to quiesce data services on-cluster, but it can also be leveraged to orchestrate snapshots and backups of external data services. This can include orchestrating logical dumps of a database or integrating directly with DBaaS APIs.
GitOps isn’t foolproof — Many administrators and engineers have experienced some form of “tested in production.” This means that changes were made to a production instance for <insert important reason here> that went around the accepted change management process. Thus, the only source of truth of what has been deployed comes from the cluster itself at runtime. Having regular backups of the manifests associated with each application as deployed ensures any of these discrepancies are captured — allowing for faithful reproductions of the original environment to support DR and/or any regulatory requirements.
Protecting container images — Kasten integrates with many of the advanced capabilities provided by Red Hat OpenShift, including support for protecting and restoring ImageStream container images. Image Streams add a wealth of functionality for building and deploying container images compared to upstream Kubernetes. Potentially overlooked sources of a cluster state are the ImageStream container images that, by default, are pushed to the cluster’s internal container registry. Rebuilding from source does not provide a guarantee that the resulting images will be the same. Once again, being able to fully reproduce an environment or recover from cluster loss requires reliable backups.
Virtual machines on Kubernetes — Thanks to the KubeVirt project and its contributors, it’s now possible to run VMs side-by-side with containerized workloads on Kubernetes. This allows organizations to streamline tooling and process around Kubernetes while providing a simple on-ramp to workloads that have not yet been, or simply will not be, transformed into fully cloud-native apps. Even on Kubernetes, VMs are inherently stateful — but they cannot be protected with traditional VM backup tools. Kasten V7.0 provides best-in-class support for the backup and restore of VMs on Kubernetes running OpenShift Virtualization.

Team Kubernetes Data Protection

Kubernetes has evolved to support all types of workloads, regardless of the lingering belief that Kubernetes is suitable only for stateless workloads. As organizations work to extract the most value from their cloud-native investments, a continued shift towards running stateful workloads is inevitable. Increased performance and mobility, simplified copy data management, polyglot persistence, and cost savings are all benefits waiting to be realized.

Whether running stateful or stateless workloads, data protection remains crucial. Veeam Kasten for Kubernetes provides a Kubernetes-native solution for application and data backup, disaster recovery, application mobility, and ransomware protection. Additionally, GitOps methodologies can be enhanced by integrating data protection into application deployments. Ultimately, embracing stateful workloads on Kubernetes can not only unlock revenue opportunities, but also bring greater flexibility, self-service, cost-effectiveness, and streamlined operations to organizations.

Try our Veeam Kasten Free solution or our Enterprise trial to get started today.