The State of Stateful v. Stateless Kubernetes

I’m in a state of disbelief (pun intended) when, in 2024, I often hear platform operators talk about Kubernetes being for stateless workloads only — or that they don’t require data protection as “all of our apps are stateless.” In this article, I’ll provide background on where states exist in our applications, the rise of the “Kubernetes = stateless” myth, and discuss the needs for data protection across all Kubernetes environments.

Stateful vs. Stateless — The Basics

Stateless refers to a workload that does not require stateful persistence, such that restarting or redeploying the workload means any data collected or altered settings are gone. Compared with their stateful virtual machine (VM) predecessors, containers are inherently stateless. By restarting a container, a user knows the workload is running the same intended workload as when the container image was initially created.

Following the logic, a stateful workload is one that preserves data such that it can be accessed across restarts. Common stateful workloads include databases like PostgreSQL, MySQL, or MongoDB.

So, then, what is a stateless application?

Applications typically consist of multiple workloads. Even monolithic, VM-based applications generally consist of separate frontend, business logic, and database workloads. Cloud-native, microservice-based applications will typically consist of many more separate workloads, each with its own specific function — with the goal of enabling organizations to ship new features and services faster than ever before.

This means that, overwhelmingly, stateless applications do not exist.

Your favorite food ordering app, a “to do” app on your phone, a form you fill out to book a doctor’s appointment — all of these would be effectively useless if there weren’t some stateful workloads responsible for persisting the critical data associated with each of these use cases.

With Kubernetes, it’s critical to understand where your application data, or state, resides — inside or outside the cluster.

How We Got Here — Kubernetes = Stateless?

As Docker became a de facto container platform in the early to mid-2010s for packaging software, having a solution to orchestrate groups of containers at scale was critical to extract the full operational value from a shift to microservices — and Kubernetes delivered. Declarative deployments, high availability, and automated scaling were significant benefits for stateless workloads that could be stopped, restarted, and cloned with ease. These workloads, often providing more front end or application-server compute, could be connected to data services running outside of the cluster. The solution was hailed by many as a success. Kubernetes was clearly a solution designed for stateless workloads, right?

Yet, all the way back in Kubernetes v1.0, alongside all these great capabilities for stateless workloads, sat Persistent Volumes — a Kubernetes-native mechanism for attaching persistent storage to ephemeral pods. Clearly, the intention to support stateful workloads was a consideration from the beginning, with several notable enhancements along the way:

These improvements, combined with efforts of many different vendors and project contributors, have created a rich ecosystem providing users with the freedom to choose which data services will provide their application state and where those workloads will run.

The Rise of GitOps

During this period, the Kubernetes community saw the rise of yet another idea to improve container orchestration — GitOps. The concept is simple: to use source control as the source of truth, storing everything required to deploy an application, including code and Kubernetes resources. Every time code is merged into the repository to make a change, a controller updates the deployment to reflect the desired state described in Git. GitOps implementations can provide a mechanism for change control, the ability to redeploy a whole environment with a single click, and a way to revert bad changes — sometimes.

Just because Kubernetes can run stateful workloads, does that mean that you should?

Tasked with building a proof-of-concept application, a developer may often opt for a cloud-hosted, managed database, or DBaaS, to host their persistent data outside of the Kubernetes cluster. DBaaS solutions provide developer-friendly APIs and quick time to value to users regardless of their level of database administration expertise. However, as is often true, what’s easiest isn’t always best. Let’s explore the reasons to consider running stateful workloads inside vs. outside of the cluster:

Opting to consolidate stateless and stateful workloads on Kubernetes can increase the flexibility of where applications can run. It can also provide additional self-service for DevOps, lower costs, and streamline tooling and process. This makes it possible to apply GitOps methodology to all parts of an application. It should be no surprise then, according to Datadog’s most recent report on real-world container use, that databases continue to be the most popular category of containerized workloads.

Keeping Your Kubernetes Data Protected

Team Stateful

If you’re already convinced that stateful workloads on Kubernetes is the way forward, you likely already understand that providing backup and disaster recovery for these applications requires a purpose-built tool. Each application is comprised of many different Kubernetes resources (ConfigMaps, Secrets, Deployments, etc.) alongside persistent volume data. What’s more, all of these need to be properly discovered, snapshotted, and exported to a safe location off-cluster.

Enter Veeam Kasten for Kubernetes, a Kubernetes-native data protection and application mobility solution. Kasten provides organizations with an easy-to-use, scalable, and secure way to create immutable backups and recover entire applications quickly and reliably.

As Kasten provides a declarative, Kubernetes-native API for defining policies, performing actions, and more, data protection can be tightly integrated into any GitOps implementation. This is true across a spectrum of use cases, from the deployment and configuration of Kasten on a new cluster, to enhancing GitOps by automating backups before rolling out code updates. By itself, GitOps provides the ability to roll back configuration changes to a Kubernetes resource; but, if one of those bad changes alters data on a persistent volume (e.g., an incorrect schema change or dropping a table), you’ll want a recent backup that can be rapidly restored.

Team Stateless

Still not ready to trade in your favorite DBaaS for a shiny new Kubernetes-hosted, operator-managed database? Kasten still has your back. Let’s look at why Kubernetes-native data protection is still necessary:

Team Kubernetes Data Protection

Kubernetes has evolved to support all types of workloads, regardless of the lingering belief that Kubernetes is suitable only for stateless workloads. As organizations work to extract the most value from their cloud-native investments, a continued shift towards running stateful workloads is inevitable. Increased performance and mobility, simplified copy data management, polyglot persistence, and cost savings are all benefits waiting to be realized.

Whether running stateful or stateless workloads, data protection remains crucial. Veeam Kasten for Kubernetes provides a Kubernetes-native solution for application and data backup, disaster recovery, application mobility, and ransomware protection. Additionally, GitOps methodologies can be enhanced by integrating data protection into application deployments. Ultimately, embracing stateful workloads on Kubernetes can not only unlock revenue opportunities, but also bring greater flexibility, self-service, cost-effectiveness, and streamlined operations to organizations.

Try our Veeam Kasten Free solution or our Enterprise trial to get started today.

Free
#1 Kubernetes Data Protection and Mobility
Exit mobile version