Veeam Support Knowledge Base
VM Loses Connection During Snapshot Removal

VM Loses Connection During Snapshot Removal

KB ID:	1681
Product:	Veeam Backup & Replication \| 9.5 \| 10 \| 11 \| 12 \| 12.1 \| 12.2 \| 12.3 \| 12.3.1
Published:	2012-10-04
Last Modified:	2025-03-11
Languages:	JP

Cheers for trusting us with the spot in your mailbox!

Now you’re less likely to miss what’s been brewing in our knowledge base with this weekly digest

Oops! Something went wrong.

Please, try again later.

Challenge

During the snapshot removal step of a Veeam Backup & Replication task, the source VMware VM loses connectivity temporarily.

Cause

Veeam does not remove the snapshot itself, Veeam sends an API call to VMware to have the action performed.

The snapshot removal process significantly lowers the total IOPS that can be delivered by the VM because of additional locks on the VMFS storage due to the increase in metadata updates, as well as the added IOP load of the snapshot removal process itself. In most environments, if you are already over 30-40% IOP load for your target storage, which is not uncommon with a busy SQL/Exchange server, then the snapshot removal process will easily push that into the 80%+ mark and likely much higher. Most storage arrays will see a significant latency penalty once IOP's get into the 80%+ mark which will of course be detrimental to application performance.

Isolation Testing

The following test should be performed when connectivity to the VM is not sensitive, for instance, during off-peak hours.

To isolate the VMware snapshot removal event, Veeam suggests the following isolation test:

Create a snapshot on the VM in question.
Leave the snapshot on the VM for the duration of time that a Veeam job runs against that VM.
Remove the snapshot.
Observe the VM during the snapshot removal.

While performing the test above, if you observe the same connectivity issues as during the Veeam job run, the issue likely exists within the VMware environment itself. Review the following list of troubleshooting steps and known issues. If none of the following work to resolve the issue, we advise that you contact VMware support directly regarding the snapshot removal issue.

Snapshot Stun Troubleshooting

If the VM being stunned is stored on an NFS 3.0 Datastore, see the section below about a Known Issue with NFS 3.0 and Hotadd.
Check for snapshots on the VM while no Veeam job is running and remove any that are found.

Veeam Backup & Replication can back up a VM that has snapshots present. However, it has been observed that when VMware attempts to remove the snapshot created during a Veeam job operation, and there was a snapshot present on the VM before the Veeam job, snapshot stun may occur.
Check for orphaned snapshots on the VM. (See: https://kb.vmware.com/s/article/1005049)
Reduce the number of concurrent tasks that are occurring within Veeam. This will reduce the number of active snapshot tasks on the datastores.
Move VM to a datastore with more available IOPS, or split the disks of the VM up into multiple datastores to more evenly spread the load.
If the VM's CPU resources spike heavily during Snapshot consolidation, consider increasing the CPU reservation for that VM.
Ensure you are on the latest build of your current version of vSphere, hypervisors, VMware Tools, and SAN firmware when applicable.
Move VM to a host with more available resources.
If possible, change the time of day that the VM gets backed up or replicated to a time when the least storage activity occurs.
Use a workingDir to redirect Snapshots to a different datastore than the one the VM resides on. https://kb.vmware.com/s/article/1002929

Known Issue with NFS 3.0 Datastores

Resolved in ESXi 8.0 U2

The VMware article regarding this issue has been updated to indicate that the underlying issue causing the snapshot stun issue with NFS 3.0 and Virtual Appliance (HotAdd) transport mode was resolved in the VMware ESXi 8.0 Update 2b release.

Broadcom KB323118: Virtual machines residing on NFS storage become unresponsive during a snapshot removal operation

Issue Description

Symptom: When the backup job completes and the snapshot removal request is processed, the Guest OS of the VM that was processed will experience multiple minutes of stun. However, isolation testing with manual snapshot creation and removal will only result in mere seconds of stun.

In ESXi versions prior to 8.0 U2b, there is a known issue with NFS 3.0 Datastores and Virtual Appliance (HotAdd) transport mode. The issue is documented in Broadcom KB323118: Virtual machines residing on NFS storage become unresponsive during a snapshot removal operation:

"This issue occurs when the target virtual machine and the backup appliance [proxy] reside on two different hosts, and the NFSv3 protocol is used to mount NFS datastores. A limitation in the NFSv3 locking method causes a lock timeout, which pauses the virtual machine being backed up [during snapshot removal]."

Solution

The recommended solution is to utilize Direct NFS Access transport mode, which will both improve backup performance and eliminate the risk of snapshot stun associated with NFS 3.0.

By default, the VMware Backup Proxy, when set to automatic transport mode selection, will attempt to utilize transport modes in the following priority:

Direct Storage Access (SAN/NFS)
Virtual Appliance (HotAdd)
Network Mode (NBD).

Therefore, if the VMware Backup Proxy is attempting to utilize Virtual Appliance (HotAdd) to process a disk on an NFS-backed Datastore, this would indicate that environmental configuration has prevented Direct NFS Access from being selected. Review the following:

Direct NFS Access Environment Configuration Requirements

The OS of the VMware Backup Proxy must be able to reach the NFS storage backing the Production Datastore(s).
The IP address of the VMware Backup Proxy's NIC that resides on the same network as the NFS storage must be present on the NFS storage's export whitelist.
The VMware Backup Proxy's Managed Server entry must be rescanned to make Veeam Backup & Replication aware of the NFS access.
[Linux-based Backup Proxy] For Direct NFS access, consider the following:
- Linux backup proxies must have the NFS client package installed.
- Debian-based backup proxies must have the nfs-common package installed.
- RHEL-based backup proxies must have the nfs-utils package installed.

Workarounds

If implementing the Direct NFS Access transport mode is not feasible, either of the following workarounds may be used to alter the backup transport usage to work around the NFS 3.0 limitation. Both workarounds will result in functional backups but will not provide the simplicity of use or performance of correctly configuring Direct NFS Access transport mode.

Workaround 1: Force Veeam Backup & Replication to use the Hotadd Proxy on the same host as the VM

Note: This workaround involves having the Veeam Backup & Replication software detect which host the VM that will be backed up is located on and assign the VMware Backup Proxy on that same host to perform disk processing.

Create a VMware Backup Proxy on every host in the VMware cluster where backups occur.
Create the following registry value on the Veeam Backup Server:

Key Location: HKLM\Software\Veeam\Veeam Backup and Replication\
Value Name: EnableSameHostHotaddMode
Value Type: DWORD (32-Bit) Value
Value Data:

Veeam automatically scans for registry values every 15 minutes. Wait 15 minutes for the value to take effect, or stop all jobs and reboot to force the value to be checked.

For the value data, there are two options: 1 or 2. Both values, 1 or 2, will enable a feature that forces Veeam Backup & Replication to first attempt to use and wait for the Proxy on the same host as the VM to be backed up. The difference between the two is as follows:
- 1 — If the VMware Backup Proxy on the same host as the VM becomes unavailable, Veeam Backup & Replication will fail over to any available proxy and use the available transport mode.
  Note: This may lead to a situation where a proxy on another host is selected, and hotadd is used, which may cause stun. This option ensures the highest backup performance but may risk VM stun.
- 2 — If the VMware Backup Proxy on the same host as VM becomes unavailable, Veeam Backup & Replication will use any available proxy, but use only the Network Mode transport mode.
  Note: This option minimizes all risks of snapshot stun but can lead to reduced backup performance when forced to use Network transport mode.

Workaround 2: Force all VMware Backup Proxies to use Network Mode

This workaround restricts the transport mode selection to Network Mode, which can severely impact backup performance in environments where 10Gb networking is not in use. The vSphere environment assigns a lower priority to Network Mode traffic, and in environments using 1Gb networking, this can result in Network Mode traffic being throttled significantly.

Edit the proxies listed under [Backup Infrastructure] > [Backup Proxies].
Click the [Choose] button next to “Transport mode”.
Select the radio option for “Network” mode.
Click [OK] to close the prompt and then [Finish] to commit the change.

More Information

To submit feedback regarding this article, please click this link: Send Article Feedback
To report a typo on this page, highlight the typo with your mouse and press CTRL + Enter.

Spelling error in text

Thank you!

Your feedback has been received and will be reviewed.

Oops! Something went wrong.

Please, try again later.

You have selected too large block!

Please try select less.

KB Feedback/Suggestion

This form is only for KB Feedback/Suggestions, if you need help with the software open a support case

Product

Topic

Description

By submitting, you are agreeing to have your personal information managed in accordance with the terms of Veeam's Privacy Notice.

Verify your email to continue your product download

We've sent a verification code to:

An email with a verification code was just sent to

Didn't receive the code? Click to resend in sec

Didn't receive the code? Click to resend

Thank you!

Your feedback has been received and will be reviewed.

Oops! Something went wrong.

Please, try again later.