PowerCLI – Break glass in case of emergency

Nothing like a bit of Hyperbole, but some days are better than others! One thing I have learned over the years is to always expect the unexpected. Yesterday’s ‘unexpected‘ was one of these;

3-12-2014 9-50-343 PM

 

 

 

 

 

 

 

That’s cool, HA kicked in and the stranded guests began to restart as expected. Not great, but it had done it’s job.

When the host returned, I began to migrate guests back to it (this is a Essentials + env, so no DRS). Then guess what ? Boom, on the other host;

3-12-2014 9-50-51 PM

 

 

 

 

 

 

 

…  couple minutes and the first hosts goes again, and again and a lovely little PSOD loop was in full swing !

What made this particulalry ‘fun’ was the fact that this was a 2 node cluster, and it was in a different country. What topped it off was JAVA being a PITA and not allowing me to load an iDRAC viewer to do some triage.

A few deep breaths and I managed to harness my JAVA woes and a 2 minute analysis showed the cause to be a bug around E1000. (kb 2059053)

The workaround is simple, remove all E1000 nics from guests. Problem was, VC is virtual and the 2 hosts are yo-yo ing and HA was going crazy. I couldn’t get access to a console or anything to be able to remediate.

I powered down both hosts cold, and brought one up. When I could get access I could see all the guests came up powered off, all 70 of them. The phones were starting to run hpt, and people were getting very concerned.

This is where you learn to truly appreciate the power and efficiency of PowerCLI.

This command showed me all the E1000’s that needed changing;

3-12-2014 10-29-27 PM

 

I cobbled together a quick script to go ahead and change them all, leaving all properties intact. Since the guests were already powered off, once powered back on , we were good to go.

Not overly complicated, simply finds the affected guests, uses a sub-expression and some input string cleanup so the output of the for each loop hits the pipeline, then change the nic type without confirmation. This saved a lot of time manually going through and checking, removing, adding for each affected guest, and got people back to work quickly.

A painful hour, and the hosts were upgraded the next day. Lessons learnt, but all good….

201110-orig-joy-600x411

 

 

 

 

 

 

 

4.      List snapshot older than x days, and their size;

3-12-2014 11-00-35 PM

 

Old snapshots…Someone has been naughty…

All of the inputs/scopes can be filtered by Cluster, Datacenter, Resource Pool etc.

Add these to your Powershell cookbooks, you never know when they might come in useful.

As always, test them, and use caution in your environments.

 

 

 

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: