CCIE5851: 2011

Sunday, March 20, 2011

OTV Deep Dive - Part 3

After a long delay, let's pick up where we left off last with our OTV deep dive. This post will focus on a key concept with OTV that is critical to understand. We'll examine how we localize our First Hop Redundancy Protocols (FHRPs). These protocols are Host Standby Routing Protocol (HSRP v1 and v2) Virtual Router Redundancy Protocol (VRRP), and Gateway Load Balancing Protocol (GLBP). These protocols allow two network devices to share a common IP address to be used as the default gateway on a subnet and provide redundancy and load balancing to clients in that subnet.
Before we can discuss FHRP localization, let's review why this might be significant to our design. Typically with FHRPs the members of the group are local to each other both logically and physically. Depending on the FHRP there is load balancing or redirection between the devices to the "active" member to handle traffic. This works well when considered locally and most of us use it without a second thought.
When we start to stretch or extend our VLANs across distances, latency is introduced. While a 1ms one-way latency may not sound significant, when accumulated over a complete flow or transaction, it can become quite detrimental to performance. This is exacerbated if the two devices are both in the same location, but have default gateways in another data center. Sub optimal switching and routing at its finest. This effect is referred to as tromboning traffic and is illustrated below where device A needs to talk with device B and the default gateway resides across a stretched VLAN.

We address this with OTV by implementing filters to prevent the FHRP peers in each opposite data centers from seeing each other and therefore becoming localized. There are two approaches to do this, one using a MAC access list which we won't cover, and the other, recommended one is via an IP ACL that is applied as a VLAN ACL (VACL). To be fair, both work equally well in my experience, but he IP ACL is easier to operationalize and I am a staunch believer in making network easier to maintain and avoiding what I refer to as Science Fair Projects. We've all worked on, inherited or (hopefully not!) created a Science Fair Project - let's avoid that. ;)

The configuration for the IP ACL looks like this:

ip access-list HSRP_IP
10 permit udp any 224.0.0.2/32 eq 1985
20 permit udp any 224.0.0.102/32 eq 1985

This access list matches the multicast addresses for HSRPv1, and HSRPv2, though can be modified for VRRP and GLBP.
This access-list is then applied as a VACL to filter the FHRP hellos from entering the OTV through the internal interfaces. The VACL looks like below where we’ll filter HSRP on VLAN 31-33.

vlan access-map HSRP_Local 10
match ip address HSRP_IP
action drop
vlan access-map HSRP_Local 20
match ip address ALL
action forward
vlan filter HSRP_Local vlan-list 16,23

If you are like me and want to verify your VACL is applied and matching, the steps are not as easy we’d like them to be but the capability does exist. *NOTE* that I am not responsible for you monkeying around with any of the other commands available when you attach to the module. You’ve been warned. :)
The first thing to do is attach to the module where your internal interfaces physically are. In the example below, it’s module 1. If your OTV is configured in a non-default VDC, you’ll need to set the parser to use that VDC as below.

champs1# attach mod 1
Attaching to module 1 ...
To exit type 'exit', to abort type '$.'
module-1# vdc 3
module-1# show system internal access-list input statistics
VLAN 16 :
=========
Tcam 1 resource usage:
----------------------
Label_b = 0x806
Bank 0
------
IPv4 Class
Policies: VACL(HSRP_Local) [Merged]
Entries:
[Index] Entry [Stats]
---------------------
[0013] deny udp 0.0.0.0/0 224.0.0.102/32 eq 1985 [1863]
[0014] deny udp 0.0.0.0/0 224.0.0.2/32 eq 1985 [4121]
[0015] permit ip 0.0.0.0/0 0.0.0.0/0 [1766386]

VLAN 23 :
=========
Tcam 1 resource usage:
----------------------
Label_b = 0x806
Bank 0
------
IPv4 Class
Policies: VACL(HSRP_Local) [Merged]
Entries:
[Index] Entry [Stats]
---------------------
[0013] deny udp 0.0.0.0/0 224.0.0.102/32 eq 1985 [1863]
[0014] deny udp 0.0.0.0/0 224.0.0.2/32 eq 1985 [4121]
[0015] permit ip 0.0.0.0/0 0.0.0.0/0 [1766386]

With this configuration, the FHRP in each data center will be locally active and mitigate the tromboning we mentioned earlier. This has a significant impact in that now we only send traffic across the Data Center Interconnect (DCI) that needs to go across as the local routers in each site can service the traffic.

Note that is technique is useful for optimizing egress traffic but does nothing to help draw or “attract” traffic into the right data center. Other technologies that provide that functionality will be the topic of future blogs. ;)

One last step to undertake when performing FHRP isolation is to exclude the FHRP MAC addresses from being advertised by OTV. You might be thinking OTV won't know about the FHRP MACs becuase of the VACL, right? Wrong. :) Due to the nature of MAC address learning, OTV will learn about the MAC addresses before the VACL drops them so we need to tell OTV to not advertise them. This is a three part process where we'll define the mac access list, add it to a route-map and then apply it to the OTV ISIS process as shown below.

mac-list OTV_HSRP seq 10 deny 0000.0c07.ac00 ffff.ffff.ff00
mac-list OTV_HSRP seq 11 deny 0000.0c9f.f000 ffff.ffff.ff00
mac-list OTV_HSRP seq 15 deny 0100.5e00.0000 ffff.ffff.ff00
mac-list OTV_HSRP seq 20 permit 0000.0000.0000 0000.0000.0000

route-map OTV_HSRP_filter permit 10
match mac-list OTV_HSRP

otv-isis default
vpn Overlay0
redistribute filter route-map OTV_HSRP_filter

We’ll cover AED election, and some other fun topics in the next post (hopefully sooner rather than later.

As always, your comments and feedback are appreciated!

Sunday, February 20, 2011

OTV Deep Dive - Part Two

Now that we've covered OTV theory and nomenclature, let's dig in to the fun stuff and talk about the CLI and what OTV looks like when it's setup. We'll be using the topology below comprised of four Nexus 7000s and eight VDCs.

We'll focus first on the minimum configuration required to get basic OTV adjacency up and working and then add in multi-homing for redundancy. First, make sure the L3 network that OTV will be traversing is multicast enabled. Today with current shipping code, neighbor discovery is done via multicast which helps facilitate easy additions and removal of sites from the OTV network. With this requirement met, we can get rolling.

A simple initial config is below and we'll dissect it.

First, we enable the feature
feature otv

Then we configure the Overlay interface
interface Overlay1

Next we configure the join interface. This is the interface that will be used for the IGMP join and will be the source IP address of all packets after encapsulation.
otv join-interface Ethernet1/7.1

Now we'll configure the control group. As its name implies the control group is the multicast group used by all OTV speakers in an Overlay network. This should be a unique multicast group in the multicast network.
otv control-group 239.192.1.1

Then we configure the data group which is used to encapsulate any L2 multicast traffic that is being extended across the Overlay. Any L3 mutlicast will be routed off of the VLAN through whatever regular multicast mechanisms exist on the network.
otv data-group 239.192.2.0/24

Next to last bare minimum config to add is the list of VLANs to be extended.
otv extend-vlan 31-33,100,1010,1088-1089

Finally, no shut to enable the interface.
no shutdown

We can now look at the Overlay interface but honestly, won't see much. Force of habit after a no shut on an interface. :)

show int o1
Overlay1 is up
BW 1000000 Kbit
Last clearing of "show interface" counters never
RX
0 unicast packets 77420 multicast packets
77420 input packets 574 bits/sec 0 packets/sec
TX
0 unicast packets 0 multicast packets
0 output packets 0 bits/sec 0 packets/sec

If we configure the other hosts in our network and multicast is working, we'll see adjacencies form as below.

champs1-OTV# show otv adj

Overlay Adjacency database

Overlay-Interface Overlay1 :
Hostname System-ID Dest Addr Up Time State
champs2-OTV 001b.54c2.41c4 10.100.251.14 2d05h UP
fresca-OTV 0026.9822.ea44 10.100.251.78 2d05h UP
pepsi-OTV f866.f206.fd44 10.100.251.82 2d05h UP

champs1-OTV#

With this in place, we now have a basic config and will be able to extend VLANs between the four devices.

The last thing we'll cover in this post is how multi-homing can be enabled. First to level set on multi-homing in this context I'm referring to the ability have redundancy in each site and not have a crippling loop.

The way this is accomplished in OTV is by the use of the concept of a site VLAN. The site VLAN is a VLAN that's dedicated to OTV and NOT extended across the Overlay but is trunked between the two OTV edge devices. This VLAN doesn't need any IP addresses or SVIs created, it just needs to exist and be added to the OTV config as shown below.

otv site-vlan 99

With the simple addition of this command the OTV edge devices will discover each other locally and then use an algorithm to determine a role each edge device will assume on a per VLAN basis. This role is called the Authoritative Edge Device (AED). The AED is responsible for forwarding all traffic for a given VLAN including broadcast and multicast traffic. Today the algorithm aligns with the VLAN ID with one edge device supporting the odd numbered VLANs and the other supporting the even numbered VLANs. This can be seen by reviewing the output below.

champs1-OTV# show otv vlan

OTV Extended VLANs and Edge Device State Information (* - AED)

VLAN Auth. Edge Device Vlan State Overlay
---- ----------------------------------- ---------- -------
31* champs1-OTV active Overlay1
32 champs2-OTV inactive(Non AED) Overlay1
33* champs1-OTV active Overlay1

1000 champs2-OTV inactive(Non AED) Overlay1
1010 champs2-OTV inactive(Non AED) Overlay1
1088 champs2-OTV inactive(Non AED) Overlay1
1089* champs1-OTV active Overlay1

If we look at the output above we can see that this edge device is the AED for VLANs 31, 33 and 1098 and is the non-AED for 32,1000, 1010 and 1088. In the event of a failure of champs2, champs1 will take over and become the AED for all VLANs.

We'll explore FHRP localization and what happens across the OTV control group in the next post. As always, your thoughts, comments and feeback are welcome.

Wednesday, February 16, 2011

OTV Deep Dive - Part One

I've been meaning to do this for a long time and now that I have the blog and am awake in the hotel room at 3AM, what better thing to do than talk about a technology I've been fortunate enough to work with for almost a year. This will be a series of posts as I'd like to take a structured approach to the technology and dig into the details and mechanics as well as operational aspects of the technology.

Overlay Transport Virtualization (OTV) is a feature available on the Nexus 7000 series switches that enables extension of VLANs across Layer 3 networks. This enables new options of data center scale and design that have not been available in the past. The two common use cases I've worked with customers to implement include data center migration and workload mobility. Interestingly, many jump to a multiple physical data center scenario and start to consider stretched clusters and worry about data sync issues and while OTV can provide value in those scenarios it also is a valid solution inside the data center where L3 interconnects may segment the network but the need for mobility is present.

OTV is significant in its ability to provide this extension without the hassles and challenges associated with traditional Layer 2 extension such as merging STP domains, MAC learning and flooding. OTV is designed to drop STP BPDUs across the Overlay interface which means STP domains on each side of the L3 network are not merged. This is significant in that it minimizes fate sharing where a STP event in one domain ripples to other domains. Additionally OTV uses IS-IS at its control plane to advertise MAC addresses and provide capabilities such as loop avoidance and optimized traffic handling. Finally, OTV doesn't have state that needs maintained as is required with pseudo wire transports like EoMPLS and VPLS. OTV is an encapsulating technology and as such add a 42 byte header to each frame transported across the Overlay. Below is the frame format in more detail.

We'll start defining the components and interfaces used when discussing OTV. Refer the topology below.

We have a typical data center aggregation layer based on Nexus 7000 which is our boundary between Layer 2 and Layer 3. The two switches, Agg1 and Agg2 utilize a Nexus technology, virtual Port Channel (vPC) to provide multi-chassis Etherchannel (MCEC) to the OTV Edge devices. In this topology, the OTV edge devices happen to be Virtual Device Contexts (VDC) that share the same sheet metal as the Agg switches but are logically separate. We'll dig into VDCs more in future blog posts, but know that VDCs are a very, very powerful feature within NX-OS on the Nexus 7000.

Three primary interfaces are used in OTV. The internal interface as its name implies is internal to OTV and is where the VLANs that are to be extended are brought in to the OTV network. These are normal Ethernet interfaces running at Layer 2 and can be trunks or access ports depending on your network's needs. It is important to note that the internal interfaces *DO* participate in STP and as such, considerations such as rootguard and appropriate STP prioritization should be taken into account. In most topologies you wouldn't want, or need the OTV edge device to be the root though if that works in your topology, OTV will work as desired.

The next interface is the join interface which is where the encapsulated L2 frames are placed on the L3 network for transport to the appropriate OTV edge device. The join interface has an IP address and behaves much as a client in that it issues IGMP requests to join the OTV multicast control group. In some topologies it is desirable to have the join interface participate in a dynamic routing protocol and that is not a problem either. As mentioned earlier, OTV encapsulates traffic and adds a 42 byte header to each packet so it may be prudent to ensure your transit network can support packets larger than 1500 bytes. Though not a requirement, performance may suffer if jumbo frames are not supported.

Finally, the Overlay interface is where OTV specific configuration options are applied to define key attributes such as multicast control groups, VLANs to be extended and join interfaces. The Overlay interface is where the (in)famous 5 commands to enable OTV are entered though anyone who's worked with the technology recognize more than 5 commands are needed for a successful implementation. :) The Overlay interface is similar to a Loopback interface in that it's a virtual interface.

In the next post, we'll discuss the initial OTV configuration and multi-homing capabilities in more detail. As always, I welcome your comments and feedback.

Saturday, February 12, 2011

Nexus 7000 + Fabric Extenders = Scalable Access Layer

One of the most difficult components in any data center architecture to design and plan for is the access layer. In a traditional network hierarchy the access layer is where the most dynamic and changing requirements exist. Myriad technologies abound and can tell a history of the data center as new technologies were introduced with the progression from 100Mb Ethernet to 1G to 10G and the emergence of Unified Fabric (FCoE). Scaling these access layers has been a black art at times because of the changing pace of technology. What if you could have an access layer that meets your current 100/1G Ethernet needs today as well as 10G, provided a reduction in management points and helps tame the Spanning Tree beast? Enter the Nexus 7000 with support for Nexus 2000 Fabric Extenders (FEX).

The Nexus 7000s have been shipping for close to 3 years now and have a well established install base, mature software and have proven themselves as scalable Data Center platforms. The Nexus 2000 has been shipping for over 2 years and has been solving access layer challenges for customers very well when paired with the Nexus 5000 switch. Combining the two technologies provides similar benefits for the traditional FEX architectures only on a larger scale. Today the Nexus 5000 series support up to a maximum of 16 FEX while the Nexus 7000 supports 32 with current code and plans for more in the future. Let’s dig into the details.

First, what are the requirements for FEX support on the Nexus 7000? Three primary requirements must be met:
1. NX-OS 5.1(1) or higher must be installed on the Nexus 7000
2. 32 port M1 10GE modules (part number)
3. EPLD must be current to support VNTag

Once these requirements are met we can connect the FEX to the Nexus 7000. The options supported include traditional 10G Short Reach (SR), 10G Long Reach (LR) optics and Fabric Extender Transceiver (FET) for the M1 32 port card. The M1 32 “L” card add support for active Twinax cables which currently are available in 7 and 10M lengths. In our example, we’ll be using SR optics.

Let’s start by verifying we meet the requirements.
We see below we are running NX-OS 5.1(2) so we’re good to go there.
cmhlab-dc2-sw2-agg1# show ver
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Documents: http://www.cisco.com/en/US/products/ps9372/tsd_products_support_series_home.html
Copyright (c) 2002-2010, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php

Software
BIOS: version 3.22.0
kickstart: version 5.1(2)
system: version 5.1(2)
BIOS compile time: 02/20/10
kickstart image file is: bootflash:///n7000-s1-kickstart.5.1.2.bin
kickstart compile time: 12/25/2020 12:00:00 [12/18/2010 09:55:20]
system image file is: bootflash:///n7000-s1-dk9.5.1.2.bin
system compile time: 11/29/2010 12:00:00 [12/18/2010 11:02:00]

We also have the correct modules installed
cmhlab-dc2-sw2-agg1# show mod
Mod Ports Module-Type Model Status
--- ----- -------------------------------- ------------------ ------------
2 32 10 Gbps Ethernet Module N7K-M132XP-12 ok
3 32 10 Gbps Ethernet XL Module N7K-M132XP-12L ok
4 32 1/10 Gbps Ethernet Module N7K-F132XP-15 ok
5 0 Supervisor module-1X N7K-SUP1 active *
6 0 Supervisor module-1X N7K-SUP1 ha-standby
8 48 10/100/1000 Mbps Ethernet Module N7K-M148GT-11 ok
9 48 1000 Mbps Optical Ethernet Modul N7K-M148GS-11 ok
10 48 10/100/1000 Mbps Ethernet Module N7K-M148GT-11 ok

Now let’s check the EPLD
*NOTE* This must be done from the default VDC and if an EPLD upgrades is required, it is disruptive so plan accordingly.
cmhlab-dc2-sw2-otv1# install all epld bootflash:n7000-s1-epld.5.1.1.img

Compatibility check:
Module Type Upgradable Impact Reason
------ ---- ---------- ---------- ------
2 LC Yes disruptive Module Upgradable
3 LC Yes disruptive Module Upgradable
4 LC Yes disruptive Module Upgradable
5 SUP Yes disruptive Module Upgradable
6 SUP Yes disruptive Module Upgradable
8 LC Yes disruptive Module Upgradable
9 LC Yes disruptive Module Upgradable
10 LC Yes disruptive Module Upgradable
1 Xbar Yes disruptive Module Upgradable
2 Xbar Yes disruptive Module Upgradable
3 Xbar Yes disruptive Module Upgradable
4 Xbar Yes disruptive Module Upgradable
5 Xbar Yes disruptive Module Upgradable
1 FAN Yes disruptive Module Upgradable
2 FAN Yes disruptive Module Upgradable
3 FAN Yes disruptive Module Upgradable
4 FAN Yes disruptive Module Upgradable

Copy complete, now saving to disk (please wait)...
Retrieving EPLD versions... Please wait.

Images will be upgraded according to following table:
Module Type EPLD Running-Version New-Version Upg-Required
------ ---- ------------- --------------- ----------- ------------
2 LC Power Manager 4.008 4.008 No
2 LC IO 1.016 1.016 No
2 LC Forwarding Engine 1.006 1.006 No
2 LC FE Bridge(1) 186.006 186.006 No
2 LC FE Bridge(2) 186.006 186.006 No
2 LC Linksec Engine(1) 2.006 2.006 No
2 LC Linksec Engine(2) 2.006 2.006 No
2 LC Linksec Engine(3) 2.006 2.006 No
2 LC Linksec Engine(4) 2.006 2.006 No
2 LC Linksec Engine(5) 2.006 2.006 No
2 LC Linksec Engine(6) 2.006 2.006 No
2 LC Linksec Engine(7) 2.006 2.006 No
2 LC Linksec Engine(8) 2.006 2.006 No
3 LC Power Manager 4.008 4.008 No
3 LC IO 1.016 1.016 No
3 LC Forwarding Engine 1.006 1.006 No
3 LC FE Bridge(1) 186.006 186.006 No
3 LC Linksec Engine(1) 2.006 2.006 No
4 LC Power Manager 1.000 1.000 No
4 LC IO 0.045 0.045 No
5 SUP Power Manager 3.009 3.009 No
5 SUP IO 3.028 3.028 No
5 SUP Inband 1.008 1.008 No
5 SUP Local Bus CPLD 3.000 3.000 No
5 SUP CMP CPLD 6.000 6.000 No
6 SUP Power Manager 3.009 3.009 No
6 SUP IO 3.028 3.028 No
6 SUP Inband 1.008 1.008 No
6 SUP Local Bus CPLD 3.000 3.000 No
6 SUP CMP CPLD 6.000 6.000 No
8 LC Power Manager 5.006 5.006 No
8 LC IO 2.014 2.014 No
8 LC Forwarding Engine 1.006 1.006 No
9 LC Power Manager 4.008 4.008 No
9 LC IO 1.006 1.006 No
9 LC Forwarding Engine 1.006 1.006 No
9 LC SFP 1.004 1.004 No
10 LC Power Manager 5.006 5.006 No
10 LC IO 2.014 2.014 No
10 LC Forwarding Engine 1.006 1.006 No
1 Xbar Power Manager 2.010 2.010 No
2 Xbar Power Manager 2.010 2.010 No
3 Xbar Power Manager 2.010 2.010 No
4 Xbar Power Manager 2.010 2.010 No
5 Xbar Power Manager 2.010 2.010 No
1 FAN Fan Controller (1) 0.007 0.007 No
1 FAN Fan Controller (2) 0.007 0.007 No
2 FAN Fan Controller (1) 0.007 0.007 No
2 FAN Fan Controller (2) 0.007 0.007 No
3 FAN Fan Controller (1) 0.007 0.007 No
3 FAN Fan Controller (2) 0.007 0.007 No
4 FAN Fan Controller (1) 0.007 0.007 No
4 FAN Fan Controller (2) 0.007 0.007 No
All Modules are up to date.
cmhlab-dc2-sw2-otv1#

So we’re in good shape there, too. It’s like I’ve done this before….. :)
Now that we’re ready, we’ve cabled the FEX to the switch via port e3/1-4 and we’ll be creating a topology that looks like this.

First, we need to install the FEX feature set. This is a bit different than what we’ve done with features in the past and must be done from the default VDC.
cmhlab-dc2-sw2-otv1# show run | i fex
cmhlab-dc2-sw2-otv1# confi t
Enter configuration commands, one per line. End with CNTL/Z.
cmhlab-dc2-sw2-otv1(config)# install feature-set fex
cmhlab-dc2-sw2-otv1(config)# show run | i fex
install feature-set fex
allow feature-set fex
allow feature-set fex
allow feature-set fex
allow feature-set fex
cmhlab-dc2-sw2-otv1(config)#

Note that each VDC now has a config for allow feature-set fex.
Next, we’ll go to our VDC where we want the FEX configured and get it setup.
cmhlab-dc2-sw2-agg1# confi
Enter configuration commands, one per line. End with CNTL/Z.
cmhlab-dc2-sw2-agg1(config)# feature-set fex

Then we’ll define the FEX and specify the model. While this isn’t required because the FEX will identify itself to the Nexus switch, I think it makes the config more readable and is somewhat self documenting.

cmhlab-dc2-sw2-agg1(config)# fex 150
cmhlab-dc2-sw2-agg1(config-fex)# type n2248T
cmhlab-dc2-sw2-agg1(config-fex)# description FEX150-for-Agg1-VDC

Now we’ll configure the physical ports the FEX is connected into.

cmhlab-dc2-sw2-agg1(config-fex)# int e3/1-4
cmhlab-dc2-sw2-agg1(config-if-range)# desc FEX 150
cmhlab-dc2-sw2-agg1(config-if-range)# switchport
cmhlab-dc2-sw2-agg1(config-if-range)# switchport mode fex-fabric
cmhlab-dc2-sw2-agg1(config-if-range)# fex associate 150
cmhlab-dc2-sw2-agg1(config-if-range)# channel-group 150

Now that we’ve told the switch to treat the ports as fex-fabric ports and created a port channel, let’s bring it up.

cmhlab-dc2-sw2-agg1(config-if-range)# int po150
cmhlab-dc2-sw2-agg1(config-if)# desc Port Channel to FEX 150
cmhlab-dc2-sw2-agg1(config-if)# no shut

cmhlab-dc2-sw2-agg1(config-if)# int e3/1-4
cmhlab-dc2-sw2-agg1(config-if-range)# shut
cmhlab-dc2-sw2-agg1(config-if-range)# no shut
cmhlab-dc2-sw2-agg1(config-if-range)#
cmhlab-dc2-sw2-agg1(config-if-range)# 2011 Feb 12 18:08:23 cmhlab-dc2-sw2-agg1 %FEX-2-NOHMS_ENV_FEX_ONLINE: FEX-150 On-line (Serial Number JAF1440BDFR)

It’s that simple.

cmhlab-dc2-sw2-agg1# show fex
FEX FEX FEX FEX
Number Description State Model Serial
------------------------------------------------------------------------
150 FEX150-for-Agg1-VDC Online N2K-C2248TP-1GE JAF1440BDFR
cmhlab-dc2-sw2-agg1#
cmhlab-dc2-sw2-agg1# show fex 150
FEX: 150 Description: FEX150-for-Agg1-VDC state: Online
FEX version: 5.1(2) [Switch version: 5.1(2)]
Extender Model: N2K-C2248TP-1GE, Extender Serial: JAF1440BDFR
Part No: 73-12748-05
pinning-mode: static Max-links: 1
Fabric port for control traffic: Eth3/1
Fabric interface state:
Po150 - Interface Up. State: Active
Eth3/1 - Interface Up. State: Active
Eth3/2 - Interface Up. State: Active
Eth3/3 - Interface Up. State: Active
Eth3/4 - Interface Up. State: Active
cmhlab-dc2-sw2-agg1#

If we look at the port channel we created, it looks like any other port channel.

cmhlab-dc2-sw2-agg1# show int po150
port-channel150 is up
Hardware: Port-Channel, address: c471.feee.c924 (bia c471.feee.c924)
Description: Port Channel to FEX 150
MTU 1500 bytes, BW 40000000 Kbit, DLY 10 usec
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA
Port mode is fex-fabric
full-duplex, 10 Gb/s
Input flow-control is off, output flow-control is off
Switchport monitor is off
EtherType is 0x8100
Members in this channel: Eth3/1, Eth3/2, Eth3/3, Eth3/4
Last clearing of "show interface" counters never
30 seconds input rate 124432 bits/sec, 12 packets/sec
30 seconds output rate 23272 bits/sec, 20 packets/sec

With the FEX being on-line, we now have 48 additional interfaces available to configure.

cmhlab-dc2-sw2-agg1# show int brief | i 150
mgmt0 -- up 10.0.2.13 1000 1500
Eth3/1 1 eth fabric up none 10G(S) 150
Eth3/2 1 eth fabric up none 10G(S) 150
Eth3/3 1 eth fabric up none 10G(S) 150
Eth3/4 1 eth fabric up none 10G(S) 150
Po150 1 eth fabric up none a-10G(S) none
Eth150/1/1 1 eth access down Administratively down auto(D) --
Eth150/1/2 1 eth access down Administratively down auto(D) --
Eth150/1/3 1 eth access down Administratively down auto(D) --
Eth150/1/4 1 eth access down Administratively down auto(D) --
Eth150/1/5 1 eth access down Administratively down auto(D) --
Eth150/1/6 1 eth access down Administratively down auto(D) --
Eth150/1/7 1 eth access down Administratively down auto(D) --
Eth150/1/8 1 eth access down Administratively down auto(D) --
Eth150/1/9 1 eth access down Administratively down auto(D) --
Eth150/1/10 1 eth access down Administratively down auto(D) --
Eth150/1/11 1 eth access down Administratively down auto(D) --
Eth150/1/12 1 eth access down Administratively down auto(D) --
Eth150/1/13 1 eth access down Administratively down auto(D) --
Eth150/1/14 1 eth access down Administratively down auto(D) --
Eth150/1/15 1 eth access down Administratively down auto(D) --
Eth150/1/16 1 eth access down Administratively down auto(D) --
Eth150/1/17 1 eth access down Administratively down auto(D) --
Eth150/1/18 1 eth access down Administratively down auto(D) --
Eth150/1/19 1 eth access down Administratively down auto(D) --
Eth150/1/20 1 eth access down Administratively down auto(D) --
Eth150/1/21 1 eth access down Administratively down auto(D) --
Eth150/1/22 1 eth access down Administratively down auto(D) --
Eth150/1/23 1 eth access down Administratively down auto(D) --
Eth150/1/24 1 eth access down Administratively down auto(D) --
Eth150/1/25 1 eth access down Administratively down auto(D) --
Eth150/1/26 1 eth access down Administratively down auto(D) --
Eth150/1/27 1 eth access down Administratively down auto(D) --
Eth150/1/28 1 eth access down Administratively down auto(D) --
Eth150/1/29 1 eth access down Administratively down auto(D) --
Eth150/1/30 1 eth access down Administratively down auto(D) --
Eth150/1/31 1 eth access down Administratively down auto(D) --
Eth150/1/32 1 eth access down Administratively down auto(D) --
Eth150/1/33 1 eth access down Administratively down auto(D) --
Eth150/1/34 1 eth access down Administratively down auto(D) --
Eth150/1/35 1 eth access down Administratively down auto(D) --
Eth150/1/36 1 eth access down Administratively down auto(D) --
Eth150/1/37 1 eth access down Administratively down auto(D) --
Eth150/1/38 1 eth access down Administratively down auto(D) --
Eth150/1/39 1 eth access down Administratively down auto(D) --
Eth150/1/40 1 eth access down Administratively down auto(D) --
Eth150/1/41 1 eth access down Administratively down auto(D) --
Eth150/1/42 1 eth access down Administratively down auto(D) --
Eth150/1/43 1 eth access down Administratively down auto(D) --
Eth150/1/44 1 eth access down Administratively down auto(D) --
Eth150/1/45 1 eth access down Administratively down auto(D) --
Eth150/1/46 1 eth access down Administratively down auto(D) --
Eth150/1/47 1 eth access down Administratively down auto(D) --
Eth150/1/48 1 eth access down Administratively down auto(D) --
cmhlab-dc2-sw2-agg1#

Note that today we cannot have a FEX multi-homed into two Nexus 7000s like we can on the Nexus 5000. Look for that capability in a future release along with support for additional FEX platforms.

When you think of the scale – 32 FEX x 48 ports = 1,536, that’s pretty impressive. Being able to take advantage of the cable savings with localized, in –rack cabling without the challenges of increased STP diameter, the FEX and Nexus 7000 make a powerful impact on the data center topology.

As always, I welcome your comments and feedback.

Sunday, January 23, 2011

The Joys of ISSU on Nexus 7000

How many times have you had to fill out a change control document to upgrade code on your network devices where you've detailed the redundancy, portions of the networks impacted, application owners notified only to have it rejected due to "impact"? Prior to my current job at Cisco, this was a common theme. I wished I had a device that would let me roll code without impacting traffic. Fast forward a few years and my wishes have come true with In Service Software Upgrade (ISSU) within NX-OS.

A brief history lesson - Storage switches have had this capability for a long time in the higher end platforms that are considered director class. It makes sense to have ISSU functionality on fibre channel switches because fibre channel as a protocol relies on the network to guarantee delivery of frames. Dropping frames means bad things for storage traffic. Moving the capability for ISSU to Ethernet/IP networks makes sense in a modern data center where high density virtualization and the "always on" mindset prevail. Networking teams have been clamoring for ISSU for a long time. Let's face it, rolling code isn't one of the more exciting things to do on a network, but it's a necessary function, good news is that we now have it.

We'll focus on ISSU on the Nexus series of devices though know that other products in Cisco's portfolio support it. To provide a hitless upgrade capability the device and software require an intrinsic separation of the control plane and data plane. This allows changes to be made in the control plane, like software version, without affecting the data plane, through which the packets and frames that traverse the device pass. NX-OS has been engineered from day one to have this separation of planes. Coupling it with years of experience in ISSU on the Cisco MDS and one of my most favorite features of NX-OS is born.

So enough talk, let's get into the action. To start an ISSU we use the install all command as shown below where we specify the kickstart image and system image to use.

cmhlab-dc2-sw2-otv1# install all kick bootflash:n7000-s1-kickstart.5.1.2.bin system bootflash:n7000-s1-dk9.5.1.2.bin

During the process the install happens before your eyes, which is great for the paranoid amongst us. J

Various components are extracted from the kickstart and system files, and verified to minimize the potential for corruption. Below is a sample of the output.

Verifying image bootflash:/n7000-s1-kickstart.5.1.2.bin for boot variable "kickstart".

[####################] 100% -- SUCCESS

Verifying image bootflash:/n7000-s1-dk9.5.1.2.bin for boot variable "system".
[####################] 100% -- SUCCESS

Verifying image type.

[####################] 100% -- SUCCESS

Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS

Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.

[####################] 100% -- SUCCESS

Extracting "bios" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS

Extracting "system" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS

Extracting "kickstart" version from image bootflash:/n7000-s1-kickstart.5.1.2.bin.

[####################] 100% -- SUCCESS

Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS

Extracting "cmp" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS

Extracting "cmp-bios" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.

[####################] 100% -- SUCCESS

Performing module support checks
[####################] 100% -- SUCCESS

Notifying services about system upgrade.

[####################] 100% -- SUCCESS

Once that is completed, the install routine also shows the type of upgrade per module, reflecting a rolling upgrade for line cards and reset for the supervisors. Rolling upgrades are non-disruptive as the modules have been engineered to provide this functionality and not drop link to ports or disrupt switching.

Compatibility check is done:

Module bootable Impact Install-type Reason

------ -------- -------------- ------------ ------

2 yes non-disruptive rolling

5 yes non-disruptive reset

6 yes non-disruptive reset

9 yes non-disruptive rolling

Finally, a nice table is presented showing the details of the upgrade and waits for the green light to continue.

Of course we want to proceed and then we see this output.

Install is in progress, please wait.

Performing runtime checks.

[####################] 100% -- SUCCESS

Syncing image bootflash:/n7000-s1-kickstart.5.1.2.bin to standby.

[####################] 100% -- SUCCESS

Syncing image bootflash:/n7000-s1-dk9.5.1.2.bin to standby.
[####################] 100% -- SUCCESS

*NOTE* The install routine automatically copies the files to the redundant supervisor for you.

Setting boot variables.
[####################] 100% -- SUCCESS

Performing configuration copy.
[####################] 100% -- SUCCESS

Module 2: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 5: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 6: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 9: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 6: Waiting for module online.
-- SUCCESS
Notifying services about the switchover.
[####################] 100% -- SUCCESS
"Switching over onto standby".
Connection closed by foreign host.

At this point, the supervisor that was the secondary (module 6 in my example) has reload and come up with the new code. This triggers the primary to initiate a Stateful Switch Over (SSO) to the new code running in the control plane. Meanwhile, data is still traversing the switch with no impact. J

Since our telnet session was disconnected during the SSO (telnet isn't SSO aware), we need to re-establish the session and issue a command to continue monitoring the upgrade.

rfuller@cmhlab-tools:~$ telnet cmhlab-dc2-sw2-otv1

Trying 10.2.0.4...

Connected to cmhlab-dc2-sw2-otv1.csc.dublin.cisco.com.

Escape character is '^]'.

User Access Verification
login: admin
Password:
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Copyright (c) 2002-2010, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php

cmhlab-dc2-sw2-otv1# show install all status
There is an on-going installation...
Enter Ctrl-C to go back to the prompt.
Continuing with installation, please wait

Trying to start the installer...
Module 6: Waiting for module online.
-- SUCCESS
2011 Jan 24 02:34:55 cmhlab-dc2-sw2-otv1 %IDEHSD-STANDBY-2-MOUNT: slot0: online
2011 Jan 24 02:35:06 cmhlab-dc2-sw2-otv1 %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 5) is now UP
2011 Jan 24 02:37:55 cmhlab-dc2-sw2-otv1 %IDEHSD-STANDBY-2-MOUNT: logflash: online

Module 2: Non-disruptive upgrading.
-- SUCCESS
Module 9: Non-disruptive upgrading.
-- SUCCESS
Install has been successful.
With that, we've upgraded our NX-OS, had the system automatically copy the files to the right locations, modify the boot values and didn't drop a frame. How's that for hot?

cmhlab-dc2-sw2-otv1# show ver i uptime

Kernel uptime is 0 day(s), 0 hour(s), 26 minute(s), 50 second(s)

*NOTE* The Kernel has been up for just a while but we'll see that the overall system has been up much longer

cmhlab-dc2-sw2-otv1# show ver i version

the GNU General Public License (GPL) version 2.0 or the GNU

BIOS: version 3.22.0
kickstart: version 5.1(2)
system: version 5.1(2)

cmhlab-dc2-sw2-otv1# show system uptime
System start time: Tue Oct 26 19:46:38 2010
System uptime: 89 days, 6 hours, 56 minutes, 26 seconds
Kernel uptime: 0 days, 0 hours, 29 minutes, 16 seconds
Active supervisor uptime: 0 days, 0 hours, 19 minutes, 56 seconds

cmhlab-dc2-sw2-otv1#

We'll cover Nexus 5000 and Nexus 1000v and ISSU in the future. Hope it was informative.

Tuesday, January 18, 2011

Here we go.....

I finally decided I needed to do some blogging, so here we go. Before we get into the fun stuff, let's talk a bit about who I am. This will help you decide if you are in the right place or not.

My name is Ron Fuller and I work as a Technology Solutions Architect with Cisco in Dublin, Ohio. I work with our Enterprise customers on data center architecture, which means I'm not a product guy per se. Architectures can be enabled by a product or suite of products though I happen to think some enable it better than others. ;) I am a dual CCIE #5851 (Routing and Switching and Storage Networking) and have held a myriad of certification from other vendors including Novell - where I started my certification track and was a Master CNE, VMware, SNIA, Microsoft, HP, Okidata, IBM, ISC2, CompTIA and more. Certifications have been a focal point for me early in my career and certainly opened doors that would have otherwise remained closed in tough times.

I have had the opportunity to be published a few times and my latest effort was a collaboration with two great guys who I am lucky to call friends as well, David Jansen and Kevin Corbin. We created NX-OS and Cisco Nexus Switching: Next-Generation Data Center Architectures with CiscoPress. The book was released last June and we're already workin

g on a 2nd Edition because of the many changes and innovations NX-OS has brought to market in the last few months and those coming! I have a passion for NX-OS and if you've been following me on Twitter (@ccie5851) you might have picked up on it. ;) I have a sticker on my laptop that says it all.

On a personal front, my wife and I have four awesome, smart, creative, cute....you get the picture...kids. We live north of Columbus OH and love to travel- WITH the kids - especially if there is a F1 race involved. We've become very adept at long haul travel with kids and have taken them with us to Japan, England, France, Germany, Australia and our last big adventure, China. I may blog about the science of traveling with little ones in the future. We think we've got a good system but may be biased.

As I mentioned earlier, F1 is a great excuse to travel and for that matter, I'm a fan of most autosports though F1 holds a special place in my heart. It is the perfect integration of technology (I'm a geek after all!) and speed, exotic locations and competition. I do watch Indycar and it's probably best to say I monitor NASCAR. NASCAR has so many races and they are so long that it becomes quite the commitment to actually WATCH every race. I still miss the days of Dale and Rusty beating and banging on each other, but as with all things, change happens.

I'm sure more of my idiosyncrasies will emerge as I write, but know that I plan to discuss NX-OS and Nexus switching, some UCS action, MDS and whatever else comes up. Its an exciting time in the Data Center space and I couldn't be happier to be hip-deep in the action!

Thanks for taking the time and see you around.