CCIE5851: 2016

Monday, December 19, 2016

vRNI 3.2 Installation – Step by Step

This week (12/8/16) VMware released vRealize Network Insight (vRNI) was released and added some really cool capabilities to the platform. Some highlights are below.

· XML Export of Firewall Rules

· Support of NSX Edge NAT

· Application Centric Micro-segmentaton

· Online upgrade

· NSX configuration, health and capacity checks

· Handful of other misc. features

I plan to blog about many of these features, but let’s start with the basics and see what we need to do and get the ball rolling. First, download the 2 files needed, the platform and proxy OVAs. Be warned, this is close to 14GB of files. If like me, you have slow(ish) Internet, patience is a virtue. You’ll want to make sure you have resources to meet the requirements.

Now that you have the files and meet the specs, let’s get started. I’ll be using the fat client because, well….I use it when I don’t have to use the web client. On a side note, I hear the 6.5 client is awesome but have not used it and NSX and vRNI don’t support it yet today. Anyways, find the data center you want to deploy the OVF to and start the process like below. You’ll start with the Platform as that’s the main engine for the product.

You’ll go through the usual process of showing the requirements, EULA, and location.

Step 1

Step 2

Step 3

Step 4

Step 5

The first real question you’ll need to address is the size of the configuration. Your choices are Medium or Large and depending on the size of your deployment and the volume of data you’ll collect. The easy decoder is the number of VMs you have to drive the sizing. If you have ~3,000 VMs, medium will work fine and if you have ~6,000 or more choose large. As always, these are “it depends” numbers and you can work with your account team to see the rest of the equation and make the best choice for you.

Step 6

Step 7

Step 8

Step 9

Step 10

Extra settings on the properties page

Step 11

Choose your resources like storage and network connectivity, fill in the blanks on the IP address detail you’ll use and let it rip.

Step 12

Step 13

Depending on your config this could take quite a while or go quickly. Hopefully you selected the Power on after Deployment button because you’ll want it to come online before you deploy the proxy. While the platform is coming online and loading all of the processes, let’s be productive and start to deploy the proxy. This is important to note, but the platform must be online for the proxy to be deployed.

Once again we’ll go through the typical OVF process with requirements, EULA, location, configuration size, storage, and network mapping.

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

Step 9

What will be new is that when you get to the Properties page you’ll need to generate the Shared Secret. This is used to link the platform with the proxy. Hopefully you’ve powered on the platform and hit its IP/DNS name with Chrome. The first thing will be to apply your license, activate it and login (admin@local/admin is the default account). Now you can generate the shared secret and use the handy Copy button to paste it to the OVF window. Fill in the rest of the blanks and again, Power on after deployment.

Step 10

Step 11

Step 12

Step 13

Step 14

Step 15

Step 16

It’ll spin and take some time depending on your config. Remember, these are big OVAs and so go get some coffee or check your email. After the proxy comes up, it’ll be automatically detected by the platform and you’re done!

The next blog, we’ll talk about how to start getting it all setup to collect data.

Links:

vRNI 3.2 Main Document Page - https://www.vmware.com/support/pubs/vrealize-network-insight-pubs.html

vRNI Upgrade Guide to 3.2 - https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2148271

Want to learn more about vRNI? This free MyLearn class is a great place to start!
https://mylearn.vmware.com/mgrReg/courses.cfm?ui=www_edu&a=one&id_subject=78247

Wednesday, November 23, 2016

Countires I have visited

It seems today is going to be one of those "brain itch" days where pent up blog post ideas come to fruition. This is a great example of one of those.

I was talking with a colleague this week and he asked where all I had been and I wished I had a concise list to share. I have tossed this around as an idea before and now it's done. I'll add a new country next April when we visit New Zealand but until then, this is as current as I can recall.

While I am at it, here's the US map of places I have visited.

I need to get around a bit more in my opinion. I enjoy seeing new places. Most countries I have visited I would go back again but there are a few I can live the rest of my life without going to again. Out of respect for everyone I won't post the names of the places I didn't like.

Race Schedule for 2017

Updated 11/14/17
As a bit of a departure from my normal technical blogs I wanted to get this out there about what I currently enjoy doing on weekends and challenging myself. That said….

VMworld 2016 Sessions I am Excited About

As VMworld draws closer and the scheduler tool is active, I am really looking at the sessions I'll try to attend while I am in Vegas. I expect to actually sit in just a few of them because as a VMware employee, we cannot pre-register and must wait until very close to the session start time. This policy, of course, makes perfect sense as ultimately it is a customer-first event but it's nice to get to see the other sessions as well.

Changing the IP Address of NSX Manager

I had a customer ask me a simple question and I assumed that there would be a simple answer. How do I change the IP address of my NSX Manager? Good question - I asked around and it seemed like a simple process but being a big fan of "Trust but verify" I wanted to do this in my lab and below is the process I used.

VMworld Discount for CCIEs or CCDEs

This is pretty exciting news and deserved more than just a Tweet. If you are a CCIE or CCDE and planning to attend VMWorld you qualify for a $300 discount!

Drop an email to vmworldteam@vmware.com with the subject line "Cisco Live Promotion" and be sure to include your first and last names and your certification number. You'll be contacted by the VMworld team with additional instructions.

Pretty cool deal! Better get on it - it's a limited time offer!

Wednesday, July 6, 2016

NSX host health-status

Yet another operational enhancement that came to light in NSX 6.2.3 is a quick way to validate ESXi host health with a focus on NSX requirements but also overall host health. It's a simple command available from the NSX Manager Central CLI.

NSX 6.2.3 and Triggered Edge Failover

One of the less glamorous but nice to have features in NSX 6.2.3 is the ability to trigger the failover of NSX Edge appliances. Being a SE, the most common use case for this that I have is during a proof of concept (POC) with a customer. Certainly there are many other use cases that range from testing and validating your setup for failover actually works, to DR plans to troubleshooting and the list goes on. I am sure you can think of others that might not have occurred to me.

Today this ability requires the use of an API call to execute, which for me is good as I need to force myself to get more comfortable in the API world. What can I say - 21 years of CLI on Novell NetWare, IBM S390 and AS/400s and plenty of Cisco terminal time has ingrained habits that are hard to break. So how do I get started with APIs? Easy enough - search for "REST API plugin" and you'll find it. I am using "RESTClient" in FireFox and looks like this.

In my home network I have an Edge Services Gateway (ESG) that is configured in Active/Standby mode. You can figure out which one is active using the "show service highavailability" command from the CLI.

To set the HA admin state to down you must use a REST API that has details about the object you are working with. The specific API call is the "<haAdminState>up</haAdminState>" under the NSX Edge appliance section. In my environment I needed to first do a GET operation to see the rest of the variables I needed to complete the call were.

The response I got was all of this - where we see the cluster ID, host ID, VM ID and other attributes about the guest. Most importantly is the new haAdminState attribute which is up meaning our active/standby is working as expected.
<appliance>
<highAvailabilityIndex>1</highAvailabilityIndex>
<vcUuid>501bc163-173d-85eb-f8f5-a90eabc15595</vcUuid>
<vmId>vm-196</vmId>
<haAdminState>up</haAdminState>
<resourcePoolId>domain-c31</resourcePoolId>
<resourcePoolName>DC1-MgmtEdge</resourcePoolName>
<datastoreId>datastore-51</datastoreId>
<datastoreName>Synology</datastoreName>
<hostId>host-43</hostId>
<hostName>dc1-edge02.fuller.net</hostName>
<vmFolderId>group-v22</vmFolderId>
<vmFolderName>vm</vmFolderName>
<vmHostname>NSX-edge-2-1</vmHostname>
<vmName>dc1-edge-01-1</vmName>
<deployed>true</deployed>
<edgeId>edge-2</edgeId>
<configuredResourcePool>
<id>domain-c31</id>
<name>DC1-MgmtEdge</name>
<isValid>true</isValid>
</configuredResourcePool>
<configuredDataStore>
<id>datastore-51</id>
<name>Synology</name>
<isValid>true</isValid>
</configuredDataStore>
</appliance>

Now we have the details we need to change the state of our edge appliance. All we need to do is a PUT with the state reading down instead of up.

Now being a big fan of trust but verify, how do we see the change was made? Let's connect to the Edge appliance VM and take a look. We can see the status is now Standby and the timestamp when the state changed.

If we check out the now active Edge appliance we see what we'd expect to see - Active and a timestamp that is before the appliance that was active became standby.

Next, let's see what the logs look like when the PUT was executed. Below are the initial entries I see on what was the standby as it transitions to active. There are dozens of other messages associated with startup, but I wanted to capture the state change. Note the use of BiDirectional Forwarding Detection (BFD). BFD is one of the neater protocols out there, IMHO. I'll blog about it in the future but know that I think BFD is a Big Freaking Deal (BFD) - that's nerd humor for you. ;)
2016-07-05T13:46:02+00:00 NSX-edge-2-0 lcp-daemon: [daemon.notice] ovs|00079|lcp|INFO|status/get handler
2016-07-05T13:46:02+00:00 NSX-edge-2-0 lcp-daemon: [daemon.notice] ovs|00080|lcp|INFO|ha_node_is_active: id 0 status 1 active 0 (0)
2016-07-05T13:46:02+00:00 NSX-edge-2-0 lcp-daemon: [daemon.notice] ovs|00081|lcp|INFO|ha_node_is_active: id 1 status 1 active 1 (0x1)
2016-07-05T13:52:29+00:00 NSX-edge-2-0 bfd: [daemon.notice] ovs|00029|bfd_main|INFO|Send diag change notification to LCP
2016-07-05T13:52:29+00:00 NSX-edge-2-0 lcp-daemon: [daemon.notice] ovs|00082|lcp|INFO|Processing BFD notification
2016-07-05T13:52:29+00:00 NSX-edge-2-0 lcp-daemon: [daemon.notice] ovs|00083|lcp|INFO|BFD session for 169.254.1.5:169.254.1.6 state: Up (0) diag: Admin Down (7)
2016-07-05T13:52:29+00:00 NSX-edge-2-0 lcp-daemon: [daemon.notice] ovs|00084|lcp|INFO|ha_node_set_active: node 1 active 0
2016-07-05T13:52:29+00:00 NSX-edge-2-0 lcp-daemon: [daemon.notice] ovs|00085|lcp|INFO|Node 1 status changed from Up to Admin Down
2016-07-05T13:52:29+00:00 NSX-edge-2-0 lcp-daemon: [daemon.notice] ovs|00086|lcp|INFO|HA state Standby, processing event BFD State Updated reason Updated
2016-07-05T13:52:29+00:00 NSX-edge-2-0 lcp-daemon: [daemon.notice] ovs|00087|lcp|INFO|ha_node_is_active: id 1 status 0 active 0 (0)

This state change does persist across a reboot as you can see below.

A simple PUT operation with the admin state to UP is all that is needed to bring this appliance back into service as a standby. Pretty neat feature that was added in 6.2.3.

Wednesday, June 29, 2016

Making Operations Easier for NSX Distributed Firewall

Operations isn't a very sexy topic, but having been there;done that, I know how important it is to an organization. The way a product makes itself easier to support, the better. With that in mind, I'd like to share an enhancement VMware NSX has made to the Central CLI.

In NSX the Distributed Firewall (DFW) policy is enforced on the ESXi host where the guest resides. This avoids hairpinning of traffic and allows horizontal scale while securing traffic as close to the source as possible. The DFW logs created are stored locally on the host instead of being centralized to the NSX Manager or another location. Until NSX 6.2.3 collecting DFW logs meant customers needed to determine which host the VM is on, connect to the host and get the log and turn around and upload it to the Global Support Services (GSS) for analysis. Not a difficult process but there was an opportunity for improvement.

In NSX 6.2 we added a feature called Central CLI which enabled customers to connect to the NSX Manager via SSH and issue commands that would go out and grab whatever information you told it to collect and then display it from the single SSH session. This in and of itself was an improvement but we didn't stop there. In NSX 6.2.3 the ability to collect a support bundle for DFW logs from a host and copy them (via SCP) to a target server was added.

Note: As of NSX 6.2.3 this is a command that must be executed in enable mode of NSX Manager.

Here's the new command in action where I'll have the NSX Manager collect the support logs from host 49, bundle them up into a compressed TAR file and copy them to a CentOS host on my network.

dc1-nsxmgr01> ena
Password:
dc1-nsxmgr01# export host-tech-support host-49 scp root@192.168.10.9:/home/logs/
Generating logs for Host: host-49...
scp /tmp/VMware-NSX-Manager-host-49--dc1-compute03.fuller.net--host-tech-support--06-28-2016-16-29-31.tgz root@192.168.10.9:/home/logs/
The authenticity of host '192.168.10.9 (192.168.10.9)' can't be established.
ECDSA key fingerprint is SHA256:jiutMmcrUKH9fZgsuR8VfNoQEz8oq0ubVPATeAXMoxg.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.10.9' (ECDSA) to the list of known hosts.
root@192.168.10.9's password:
dc1-nsxmgr01#

We can see it triggered to creation of the logs on the host and then used SCP to copy it to the destination (192.168.10.9 in my network).

If we look on the destination server we can see the file.

[root@lab-tools logs]# ls -l
total 932
-rw-r--r--. 1 root root 952931 Jun 28 16:40 VMware-NSX-Manager-host-49--dc1-compute03.fuller.net--host-tech-support--06-28-2016-16-40-31.tgz
[root@lab-tools logs]#

So there you have it, an easier way to collect support logs. I'll have an additional blog on how to track a VM down that has been moving around due to DRS.

Let me know if you like posts with an operational focus.

Tuesday, June 28, 2016

On Intel NUCs and the value of EVC

As I mentioned in the previous post, I am adding an Intel NUC to my home lab. Here is where the *fun* begins.

I added the NUC to my cluster and tried to vMotion a machine to it - worked like a champ! I tried another machine and hit this issue.

The way I read this was that the NUC didn't support AES-NI or PCLMULQDQ. Oh man. Did I buy the wrong unit? Is there something wrong with the one I have?! I started searching, and everything points to the NUC supporting AES-NI. Did I mention I moved the NUC to the basement? Yeah, there is no monitor down there, so I brought it back up to the office and connected it up. I went through every screen in the BIOS looking for AES settings and turned up nothing. I opened a case with Intel and also tried their @Intelsupport Twitter account. We had a good exchange where they confirmed the unit supported AES-NI and they even opened a case for me on the back end. I will say given the vapid response from most vendors, @Intelsupport was far ahead of the rest - good job! This part of the story spans Sunday off and on and parts of Monday.

If you've ever taken a troubleshooting class or studied methodology, you might notice a mistake I made. Instead of reading the whole error message and *understanding* it I locked in on AES-NI and followed that rat hole far too long. Hindsight being 20/20 and all I figured I'd share my pain in the hopes it'll help someone else avoid it. Now, back to my obsession.....

It's now Tuesday morning and I decided I would boot the NUC to Linux and verify the CPU supported AES myself. I again used Rufus to create a bootable Debian USB and ran the "lscpu" command where I could see "aes" in the jumble of text. Hint - use grep for aes and it'll highlight it in red or use "grep -m1 -o aes /proc/cpuinfo" I verified it was there so decided I would try a similar path through ESXi. I found this KB Checking CPU information on an ESXI host but the display for the capabilities is pretty obtuse. As I sat there thinking about where to go next, I looked at the error message and decided I should read KB vMotion/EVC incompatibility issues due to AES/PCLMULQDQ and it hit me.

My new host needed to be "equalized" with my older hosts. A few clicks and I configured Enhanced vMotion Compatibility (EVC) set to a baseline of Nehalem CPU features.

Now I can vMotion like a champ and life is good. The NUC is back in the basement and has VMs on it running happily. Two lessons learned - RTFEM (Read the freaking error message) and the VMware Knowledge Base (KBs) are a great resource. At the end of the day, I learned a lot and ultimately, that's what it's all about, right? :)

Intel NUC Added to the home lab

I am excited about the newest addition to the home lab which is an Intel NUC. Specifically I have a NUC6 with an i3 processor, 32GB of RAM and a 128GB SSD. I've been looking for a smaller footprint for the lab for some time and the NUC kept rising to the top of the list. I looked around at other options and the NUC just seemed the simplest way to go. I had originally hoped for a solution to get past the 32GB RAM limitation but after pricing 64GB kits, I decided 32GB would do the trick.

My current lab is 2 Dell C1100 (CS-24TY) servers I bought on eBay last year. These servers have 72GB of RAM and boot from ESXi on a USB stick with no local disk. They've done the job well but are power hungry. I knew they'd suck the watts down but didn't realize how much until I connected a Kill-a-watt meter to the plug for the UPS and saw I hovered around 480w between the two servers and a Cisco 4948 switch. While power isn't super expensive where I live, it adds up and I had noticed and increase in our power bill. Thus, the drive for the NUC - not to mention the benefit of having a local SSD.

I ordered the NUC from Amazon and it was delivered and I finally had some free time to work on it Sunday. The box is heavier than I expected but comfortably so, not like a lead weight. The build was easy after removing 4 captive screws from the base and popping in the RAM and SSD.

Here's what I ordered on Amazon

Ready to start

Once I popped the case open.

I used Rufus to create a bootable USB with ESXi and connected the NUC to one of my monitors. I turned on the NUC and the only light was a blue LED on the "front" of the unit - it's a square so front is relative based on the stamp on the baseplate. Nothing on my monitor at all. I made sure the connections were secure, power cycled the unit and started to wonder if I had a bad box. I was using a DisplayPort adapter to a VGA output but only saw "No signal" on the monitor. The NUC has a HDMI output but I don't have HDMI on my monitors. I do have DVI so figured I'd give it a try and was rewarded with seeing my NUC boot. Not sure why this was the case but I was eager to get started so didn't dig into it further.

My first attempt at installing ESXi was with 6.0 Update 1 and it didn't detect the NIC. I read other blog posts about adding the driver but figured I would give ESXi 6.0 Update 2 a go. When it booted it found the NIC and we were up and grooving.

After ESXi was installed I decided to flash the BIOS to the current revision. It's been a while since I have spent much time in a BIOS (AMI anyone?) and was amazed that the Visual BIOS supports network connectivity, mouse and not nearly as obtuse as I remember! I downloaded the newest BIOS from Intel's site (0044 revision) and applied the update though I used the recovery method and not the GUI (old habits die hard). Easy as can be.

Now it was time to turn this thing up and really get cooking. I moved the unit to the basement with the rest of the lab equipment because, well, what could go wrong? I had the right software, verified network connectivity and was all set. Famous last words.

If you want to read about my pain - here's part 2.

CCIE5851