As I mentioned in the previous post, I am adding an Intel NUC to my home lab. Here is where the *fun* begins.
I added the NUC to my cluster and tried to vMotion a machine to it -
worked like a champ! I tried another machine and hit this issue.
The
way I read this was that the NUC didn't support AES-NI or PCLMULQDQ. Oh
man. Did I buy the wrong unit? Is there something wrong with the one I
have?! I started searching, and everything points to the NUC
supporting AES-NI. Did I mention I moved the NUC to the basement? Yeah,
there is no monitor down there, so I brought it back up to the office
and connected it up. I went through every screen in the BIOS looking
for AES settings and turned up nothing. I opened a case with Intel and
also tried their @Intelsupport Twitter account. We had a good exchange
where they confirmed the unit supported AES-NI and they even opened a
case for me on the back end. I will say given the vapid response from
most vendors, @Intelsupport was far ahead of the rest - good job! This
part of the story spans Sunday off and on and parts of Monday.
If
you've ever taken a troubleshooting class or studied methodology, you
might notice a mistake I made. Instead of reading the whole error
message and *understanding* it I locked in on AES-NI and followed that
rat hole far too long. Hindsight being 20/20 and all I figured I'd share
my pain in the hopes it'll help someone else avoid it. Now, back to my
obsession.....
It's now Tuesday morning and I decided I
would boot the NUC to Linux and verify the CPU supported AES myself. I
again used Rufus to create a bootable Debian USB and ran the "lscpu"
command where I could see "aes" in the jumble of text. Hint - use grep
for aes and it'll highlight it in red or use "grep -m1 -o aes
/proc/cpuinfo" I verified it was there so decided I would try a similar
path through ESXi. I found this KB Checking CPU information on an ESXI host
but the display for the capabilities is pretty obtuse. As I sat there
thinking about where to go next, I looked at the error message and
decided I should read KB vMotion/EVC incompatibility issues due to AES/PCLMULQDQ and it hit me.
My
new host needed to be "equalized" with my older hosts. A few clicks
and I configured Enhanced vMotion Compatibility (EVC) set to a baseline
of Nehalem CPU features.
Now
I can vMotion like a champ and life is good. The NUC is back in the
basement and has VMs on it running happily. Two lessons learned - RTFEM
(Read the freaking error message) and the VMware Knowledge Base (KBs) are a great resource. At the end of the day, I learned a
lot and ultimately, that's what it's all about, right? :)
No comments:
Post a Comment