Kanister Simplifies Application-level Data Operations on Kubernetes

Pavan Navarathna

3 years ago

IT departments have many choices for infrastructure and application deployment. Organizations choose containers for their portability, scalability and deployment speed, among other benefits. Adopting a cloud-native approach offers additional benefits such as increased agility, lower CAPEX costs, and improved scalability and reliability.

For teams working with Kubernetes, ensuring that all data, particularly application data, is protected can be a tricky proposition. With each version of Kubernetes release, users have seen improvements in running stateful workloads. However, on its own, Kubernetes lacks robust application data management capabilities.

Lifecycles and workflows behind cloud-native applications can be complex. Kubernetes currently allows admins to manage data in several ways: storage-centric snapshots, storage-centric snapshots with hooks or APIs into the application, and/or leveraging data services.

Admins may take a storage-centric approach or a data-centric approach using tools such as mysqldump or pg_dump. Briefly, each has its own pros and cons:

Storage-centric snapshots

Rely on the underlying storage provider’s snapshot capabilities
Snapshot underlying volumes
Often sufficient
Don’t interact with the application itself

Storage-centric snapshots with hooks or APIs

Freeze or unfreeze the application during snapshotting process

Data service-centric approach

Leverages tools provided by databases
Features such as encryption are provided by the tools
Recovery process is a bit complex

Application-centric

Use all of the above in a coordinated fashion

Let’s talk a little about the application-centric approach to data operations in Kubernetes.

An Application-centric Approach with Kanister

For DevOps teams using Kubernetes, Kanister is an open-source project that allows domain experts to capture application specific data management tasks in blueprints that can be easily shared and extended. First posted on GitHub almost three years ago, Kanister takes care of the tedious details around application data management on Kubernetes and presents a homogeneous operational experience across applications at scale.

Kanister comprises four primary components:

The Kanister controller: An operator based on the Kubernetes operator pattern, that helps to manage Blueprints, ActionSets and Profiles.
Blueprints: Custom resources used to define workflows for operations such as backup, restore or delete. Essentially, they provide the ability to hook into the data service(s).
ActionSets: Custom resources used to execute a specific action from a specific Blueprint.

Profiles: These determine the destination for backups or the source for restores (i.e., AWS, Azure blob storage or another target).

*(at 10:05 in* *video; different iterations show flow between components)*

Kanister offers two additional tools, Kanctl and Kando. Kanctl can be used to create ActionSets and Profiles. Kando helps move data to and from the object store within the container.

In this informative video, Kasten by Veeam’s Pavan Navarathna discusses data management challenges in Kubernetes and provides a demonstration of Kanister with real data. Users familiar with YAML can easily jump in and try it themselves. The demo was staged in conjunction with the DoKC, “an openly governed group of curious and experienced practitioners, taking inspiration from the CNCF and Apache Software Foundation.” DoKC’s goal is to “assist in the emergence and development of techniques for the use of Kubernetes for data.”

In addition to the current capabilities of Kanister, the team is planning to add a guide for writing Blueprints. There are a number of Blueprints for various popular databases, but for those users using a database that doesn’t currently have a Blueprint, the guide should help write one. Also in the works is a plan to add file storage as a destination for backups, and add encryption, compression and deduplication for the data being moved.

Interested in trying Kanister? Visit Kanister.io to learn more or download or fork it on GitHub.