Veeam Kasten for Kubernetes backup action for longhorn volumes fails with the error message:
too many snapshots created
When integrating with CSI-based volumes, Veeam Kasten for Kubernetes employs VolumeSnapshot resources to create snapshots during backup operations.
With Longhorn, upon the creation of a VolumeSnapshot and its corresponding VolumeSnapshotContent resource by the snapshot-controller, Longhorn generates a snapshots.longhorn.io resource and synchronizes it to produce a Longhorn backend snapshot. As part of its retention policy, Veeam Kasten for Kubernetes deletes the VolumeSnapshotContent resource to remove the snapshot. However, Longhorn does not automatically delete the snapshots.longhorn.io resource it created; the snapshot is merely flagged as removed but not purged from the system.
Over time, this can lead to an accumulation of snapshots for a volume, especially if backups are frequent. Eventually, this may cause the backup process to fail when the number of snapshots reaches Longhorn's maximum limit of 254 per volume.
Below is an example of the snapshot count for an application that was set to retain 8 snapshots in Veeam Kasten for Kubernetes.
#PVC in one sample namespace ❯ kubectl get pvc -n postgresql NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-postgres-postgresql-0 Bound pvc-fafda05d-314e-420f-bf37-d7365b31ea1c 8Gi RWO longhorn 24h #count of VolumeSnapshot resource ❯ kubectl get volumesnapshot -n postgresql --no-headers|wc -l 8 #Count of Longhorn snapshot CRs ❯ kubectl get snapshots.longhorn.io -n longhorn-system |grep pvc-fafda05d-314e-420f-bf37-d7365b31ea1c |wc -l 85
Below is the screenshot from Longhorn UI showing hidden snapshots that are marked as removed but not purged.
Currently, Longhorn does not automatically purge the removed snapshots when the volumesnapshot/volumesnapshotcontent resources are deleted from the k8s cluster.
Starting in Longhurb version 1.4.1, a new type of recurring job was introduced: snapshot-cleanup. This job type will purge removed snapshots and system snapshots.
Within Longhorn, configure a recurring job for the snapshot-cleanup task type.
Select a Group if the default group needs to be added (Having default in groups will automatically schedule this recurring job to any volume with no recurring job).
The recurring Job creates a K8s cronjob resource, which in turn runs a snapshot-cleanup pod as per the cron expression specified during the job creation.
Below is the log from the snapshot-cleanup pod that ran after the creation of the recurring job.
❯ kubectl logs snapshot-cleanup-28069140-c8cm5 -n longhorn-system time="2023-05-15T11:00:00Z" level=debug msg="Setting allow-recurring-job-while-volume-detached is false" time="2023-05-15T11:00:00Z" level=debug msg="Get volumes from label recurring-job.longhorn.io/snapshot-cleanup=enabled" time="2023-05-15T11:00:00Z" level=debug msg="Get volumes from label recurring-job-group.longhorn.io/default=enabled" time="2023-05-15T11:00:00Z" level=info msg="Found 1 volumes with recurring job snapshot-cleanup" time="2023-05-15T11:00:00Z" level=info msg="Creating job" concurrent=1 groups=default job=snapshot-cleanup labels="{\"RecurringJob\":\"snapshot-cleanup\"}" retain=0 task=snapshot-cleanup volume=pvc-84d3d7d0-3abc-427c-a959-5ccc7da912a5 time="2023-05-15T11:00:01Z" level=info msg="job starts running" labels="map[RecurringJob:snapshot-cleanup]" namespace=longhorn-system retain=0 snapshotName=snapshot-90135f33-93ce-4de4-829b-4dd01db2d827 task=snapshot-cleanup volumeName=pvc-84d3d7d0-3abc-427c-a959-5ccc7da912a5 time="2023-05-15T11:00:01Z" level=info msg="Running recurring snapshot for volume pvc-84d3d7d0-3abc-427c-a959-5ccc7da912a5" labels="map[RecurringJob:snapshot-cleanup]" namespace=longhorn-system retain=0 snapshotName=snapshot-90135f33-93ce-4de4-829b-4dd01db2d827 task=snapshot-cleanup volumeName=pvc-84d3d7d0-3abc-427c-a959-5ccc7da912a5 time="2023-05-15T11:00:01Z" level=debug msg="Purged snapshots" labels="map[RecurringJob:snapshot-cleanup]" namespace=longhorn-system retain=0 snapshotName=snapshot-90135f33-93ce-4de4-829b-4dd01db2d827 task=snapshot-cleanup volume=pvc-84d3d7d0-3abc-427c-a959-5ccc7da912a5 volumeName=pvc-84d3d7d0-3abc-427c-a959-5ccc7da912a5 time="2023-05-15T11:00:01Z" level=info msg="Finished recurring snapshot" labels="map[RecurringJob:snapshot-cleanup]" namespace=longhorn-system retain=0 snapshotName=snapshot-90135f33-93ce-4de4-829b-4dd01db2d827 task=snapshot-cleanup volumeName=pvc-84d3d7d0-3abc-427c-a959-5ccc7da912a5 time="2023-05-15T11:00:01Z" level=info msg="Created job" concurrent=1 groups=default job=snapshot-cleanup labels="{\"RecurringJob\":\"snapshot-cleanup\"}" retain=0 task=snapshot-cleanup volume=pvc-84d3d7d0-3abc-427c-a959-5ccc7da912a5
Please refer to Longhorn documentation below to read more about recurring jobs.
This form is only for KB Feedback/Suggestions, if you need help with the software open a support case