EMC-VNX 1/2 and ESXi: Potential Data Unavailability

Another ETA hit my inbox over the weekend, this time relating to VNX/VNX2 and ESXi where a condition that could result in data being unavailable exists.

If you are running a VNX and have a code base  of 05.32.000.5.219 or earlier or a VNX2 with codebase lower than 05.33.006.5.102 and earlier you may be affected.

The full ETA can be found here, but here’s a summary;


25-10-2015 10-34-02 PM

 

 

 

The workaround is to disable VAAI Hardware Accelerated locking (CAS). The fix involves a code upgrade to 05.33.008.5.119 for VNX2. A resolution for VNX 1 is on the way. A patch is available for those running on05.32.000.5.218.

So either way, VAAI  CAS should be temporarily disabled until after the upgrade is completed.

This is not great, but not the end of the world, so don’t get hysterical 🙂

Anyway, here’s how to disable VAAI using PowerCLI; This example targets all hosts connected to the logged in  Virtual Center server.

There are 3 settings available, but only the VMFS3.HardwareAcceleratedLocking should be disabled from the AdvancedConfiguration Properties. This PowerCLI one-liner will make the change, is non-disruptive and takes effect immediately, without requiring a host reboot. As with any change though, consider doing it during a maintenance window and exercise proper change control

You could easily change/filter the scope by using the Get-DataCenter, Get-DataStore, Get-DVSwitch (etc) as the host list collection input.

To check  status to confirm it’s off;

27-10-2015 6-44-16 PM

 

 

 

After the upgrade is done, Enable VAAI again;

Verify VAAI is doing it’s thing via ESXTOP;

From the devices screen (press u) toggle fields (press f) and select the VAAI counters (Press O & P) then press Enter to return to the active window.

You will now see the column headers on the right which will show data as VAAI does it’s job.

Like this;       vaai

CLONE_RD shows the number of FullCopy read, CLONE_WR shows the number of FullCopy Writes& CLONE_F is the number of failed FullCopy executions

Let’s force through some data to test by svMotioning a guest and seeing if the counters move.

28-11-2014 5-28-39 PM

Before;

 

 

28-11-2014 5-28-59 PM

During;

 

28-11-2014 5-29-26 PM

After;

 

 

All in all, a bit of  a nuisance, but not too much fuss to get sorted.

 

 

 

Related Post

4 thoughts on “EMC-VNX 1/2 and ESXi: Potential Data Unavailability

  1. Excellent write-up! I was curious, why disable all three VAAI settings? I went through this at the beginning of October, when the ETA first came out, and it only mentions hardware assisted locking. I re-read the ETA, just in case it was expanded to include the other features, but still only mentions locking.

    Thanks!

  2. I read the EMC ETA too. Absolutely not advised to do it “on the fly”.
    If datastores are in ATS-Only, disabling VAAI without stopping the IO, could result in an ESXi host keeping its ATS Lock on a datastore and preventing others to access it.

    1. Fair comment Peter. It’s incumbent on any folk making these changes to practice proper change control and take adequate precautions.
      I note that the VMware KB referred to in the ETA states “Enabling or disabling ATS should not impact any virtual machines running, but VMware recommends to ensure this change during a maintenance window.”

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: