Tuesday, June 28, 2016

On Intel NUCs and the value of EVC

As I mentioned in the previous post, I am adding an Intel NUC to my home lab. Here is where the *fun* begins.

I added the NUC to my cluster and tried to vMotion a machine to it - worked like a champ!  I tried another machine and hit this issue.

The way I read this was that the NUC didn't support AES-NI or PCLMULQDQ. Oh man.  Did I buy the wrong unit?  Is there something wrong with the one I have?!  I started searching, and everything points to the NUC supporting AES-NI. Did I mention I moved the NUC to the basement? Yeah, there is no monitor down there, so I brought it back up to the office and connected it up.  I went through every screen in the BIOS looking for AES settings and turned up nothing. I opened a case with Intel and also tried their @Intelsupport Twitter account.  We had a good exchange where they confirmed the unit supported AES-NI and they even opened a case for me on the back end.  I will say given the vapid response from most vendors, @Intelsupport was far ahead of the rest - good job! This part of the story spans Sunday off and on and parts of Monday.

If you've ever taken a troubleshooting class or studied methodology, you might notice a mistake I made. Instead of reading the whole error message and *understanding* it I locked in on AES-NI and followed that rat hole far too long. Hindsight being 20/20 and all I figured I'd share my pain in the hopes it'll help someone else avoid it. Now, back to my obsession.....

It's now Tuesday morning and I decided I would boot the NUC to Linux and verify the CPU supported AES myself. I again used Rufus to create a bootable Debian USB and ran the "lscpu" command where I could see "aes" in the jumble of text.  Hint - use grep for aes and it'll highlight it in red or use "grep -m1 -o aes /proc/cpuinfo"  I verified it was there so decided I would try a similar path through ESXi.  I found this KB Checking CPU information on an ESXI host but the display for the capabilities is pretty obtuse.  As I sat there thinking about where to go next, I looked at the error message and decided I should read KB vMotion/EVC incompatibility issues due to AES/PCLMULQDQ and it hit me.
My new host needed to be "equalized" with my older hosts.  A few clicks and I configured Enhanced vMotion Compatibility (EVC) set to a baseline of Nehalem CPU features.


Now I can vMotion like a champ and life is good.  The NUC is back in the basement and has VMs on it running happily.  Two lessons learned - RTFEM (Read the freaking error message) and the VMware Knowledge Base (KBs) are a great resource.  At the end of the day, I learned a lot and ultimately, that's what it's all about, right?  :)

No comments:

Post a Comment