Sunday, February 20, 2011

OTV Deep Dive - Part Two

Now that we've covered OTV theory and nomenclature, let's dig in to the fun stuff and talk about the CLI and what OTV looks like when it's setup. We'll be using the topology below comprised of four Nexus 7000s and eight VDCs.
























We'll focus first on the minimum configuration required to get basic OTV adjacency up and working and then add in multi-homing for redundancy. First, make sure the L3 network that OTV will be traversing is multicast enabled. Today with current shipping code, neighbor discovery is done via multicast which helps facilitate easy additions and removal of sites from the OTV network. With this requirement met, we can get rolling.

A simple initial config is below and we'll dissect it.

First, we enable the feature
feature otv

Then we configure the Overlay interface
interface Overlay1

Next we configure the join interface. This is the interface that will be used for the IGMP join and will be the source IP address of all packets after encapsulation.
otv join-interface Ethernet1/7.1

Now we'll configure the control group. As its name implies the control group is the multicast group used by all OTV speakers in an Overlay network. This should be a unique multicast group in the multicast network.
otv control-group 239.192.1.1

Then we configure the data group which is used to encapsulate any L2 multicast traffic that is being extended across the Overlay. Any L3 mutlicast will be routed off of the VLAN through whatever regular multicast mechanisms exist on the network.
otv data-group 239.192.2.0/24

Next to last bare minimum config to add is the list of VLANs to be extended.
otv extend-vlan 31-33,100,1010,1088-1089

Finally, no shut to enable the interface.
no shutdown


We can now look at the Overlay interface but honestly, won't see much. Force of habit after a no shut on an interface. :)

show int o1
Overlay1 is up
BW 1000000 Kbit
Last clearing of "show interface" counters never
RX
0 unicast packets 77420 multicast packets
77420 input packets 574 bits/sec 0 packets/sec
TX
0 unicast packets 0 multicast packets
0 output packets 0 bits/sec 0 packets/sec

If we configure the other hosts in our network and multicast is working, we'll see adjacencies form as below.

champs1-OTV# show otv adj

Overlay Adjacency database

Overlay-Interface Overlay1 :
Hostname System-ID Dest Addr Up Time State
champs2-OTV 001b.54c2.41c4 10.100.251.14 2d05h UP
fresca-OTV 0026.9822.ea44 10.100.251.78 2d05h UP
pepsi-OTV f866.f206.fd44 10.100.251.82 2d05h UP

champs1-OTV#


With this in place, we now have a basic config and will be able to extend VLANs between the four devices.

The last thing we'll cover in this post is how multi-homing can be enabled. First to level set on multi-homing in this context I'm referring to the ability have redundancy in each site and not have a crippling loop.

The way this is accomplished in OTV is by the use of the concept of a site VLAN. The site VLAN is a VLAN that's dedicated to OTV and NOT extended across the Overlay but is trunked between the two OTV edge devices. This VLAN doesn't need any IP addresses or SVIs created, it just needs to exist and be added to the OTV config as shown below.

otv site-vlan 99

With the simple addition of this command the OTV edge devices will discover each other locally and then use an algorithm to determine a role each edge device will assume on a per VLAN basis. This role is called the Authoritative Edge Device (AED). The AED is responsible for forwarding all traffic for a given VLAN including broadcast and multicast traffic. Today the algorithm aligns with the VLAN ID with one edge device supporting the odd numbered VLANs and the other supporting the even numbered VLANs. This can be seen by reviewing the output below.

champs1-OTV# show otv vlan


OTV Extended VLANs and Edge Device State Information (* - AED)

VLAN Auth. Edge Device Vlan State Overlay
---- ----------------------------------- ---------- -------
31* champs1-OTV active Overlay1
32 champs2-OTV inactive(Non AED) Overlay1
33* champs1-OTV active Overlay1

1000 champs2-OTV inactive(Non AED) Overlay1
1010 champs2-OTV inactive(Non AED) Overlay1
1088 champs2-OTV inactive(Non AED) Overlay1
1089* champs1-OTV active Overlay1


If we look at the output above we can see that this edge device is the AED for VLANs 31, 33 and 1098 and is the non-AED for 32,1000, 1010 and 1088. In the event of a failure of champs2, champs1 will take over and become the AED for all VLANs.

We'll explore FHRP localization and what happens across the OTV control group in the next post. As always, your thoughts, comments and feeback are welcome.

Wednesday, February 16, 2011

OTV Deep Dive - Part One

I've been meaning to do this for a long time and now that I have the blog and am awake in the hotel room at 3AM, what better thing to do than talk about a technology I've been fortunate enough to work with for almost a year. This will be a series of posts as I'd like to take a structured approach to the technology and dig into the details and mechanics as well as operational aspects of the technology.

Overlay Transport Virtualization (OTV) is a feature available on the Nexus 7000 series switches that enables extension of VLANs across Layer 3 networks. This enables new options of data center scale and design that have not been available in the past. The two common use cases I've worked with customers to implement include data center migration and workload mobility. Interestingly, many jump to a multiple physical data center scenario and start to consider stretched clusters and worry about data sync issues and while OTV can provide value in those scenarios it also is a valid solution inside the data center where L3 interconnects may segment the network but the need for mobility is present.

OTV is significant in its ability to provide this extension without the hassles and challenges associated with traditional Layer 2 extension such as merging STP domains, MAC learning and flooding. OTV is designed to drop STP BPDUs across the Overlay interface which means STP domains on each side of the L3 network are not merged. This is significant in that it minimizes fate sharing where a STP event in one domain ripples to other domains. Additionally OTV uses IS-IS at its control plane to advertise MAC addresses and provide capabilities such as loop avoidance and optimized traffic handling. Finally, OTV doesn't have state that needs maintained as is required with pseudo wire transports like EoMPLS and VPLS. OTV is an encapsulating technology and as such add a 42 byte header to each frame transported across the Overlay. Below is the frame format in more detail.















We'll start defining the components and interfaces used when discussing OTV. Refer the topology below.

















We have a typical data center aggregation layer based on Nexus 7000 which is our boundary between Layer 2 and Layer 3. The two switches, Agg1 and Agg2 utilize a Nexus technology, virtual Port Channel (vPC) to provide multi-chassis Etherchannel (MCEC) to the OTV Edge devices. In this topology, the OTV edge devices happen to be Virtual Device Contexts (VDC) that share the same sheet metal as the Agg switches but are logically separate. We'll dig into VDCs more in future blog posts, but know that VDCs are a very, very powerful feature within NX-OS on the Nexus 7000.

Three primary interfaces are used in OTV. The internal interface as its name implies is internal to OTV and is where the VLANs that are to be extended are brought in to the OTV network. These are normal Ethernet interfaces running at Layer 2 and can be trunks or access ports depending on your network's needs. It is important to note that the internal interfaces *DO* participate in STP and as such, considerations such as rootguard and appropriate STP prioritization should be taken into account. In most topologies you wouldn't want, or need the OTV edge device to be the root though if that works in your topology, OTV will work as desired.

The next interface is the join interface which is where the encapsulated L2 frames are placed on the L3 network for transport to the appropriate OTV edge device. The join interface has an IP address and behaves much as a client in that it issues IGMP requests to join the OTV multicast control group. In some topologies it is desirable to have the join interface participate in a dynamic routing protocol and that is not a problem either. As mentioned earlier, OTV encapsulates traffic and adds a 42 byte header to each packet so it may be prudent to ensure your transit network can support packets larger than 1500 bytes. Though not a requirement, performance may suffer if jumbo frames are not supported.

Finally, the Overlay interface is where OTV specific configuration options are applied to define key attributes such as multicast control groups, VLANs to be extended and join interfaces. The Overlay interface is where the (in)famous 5 commands to enable OTV are entered though anyone who's worked with the technology recognize more than 5 commands are needed for a successful implementation. :) The Overlay interface is similar to a Loopback interface in that it's a virtual interface.

In the next post, we'll discuss the initial OTV configuration and multi-homing capabilities in more detail. As always, I welcome your comments and feedback.

Saturday, February 12, 2011

Nexus 7000 + Fabric Extenders = Scalable Access Layer



One of the most difficult components in any data center architecture to design and plan for is the access layer. In a traditional network hierarchy the access layer is where the most dynamic and changing requirements exist. Myriad technologies abound and can tell a history of the data center as new technologies were introduced with the progression from 100Mb Ethernet to 1G to 10G and the emergence of Unified Fabric (FCoE). Scaling these access layers has been a black art at times because of the changing pace of technology. What if you could have an access layer that meets your current 100/1G Ethernet needs today as well as 10G, provided a reduction in management points and helps tame the Spanning Tree beast? Enter the Nexus 7000 with support for Nexus 2000 Fabric Extenders (FEX).


The Nexus 7000s have been shipping for close to 3 years now and have a well established install base, mature software and have proven themselves as scalable Data Center platforms. The Nexus 2000 has been shipping for over 2 years and has been solving access layer challenges for customers very well when paired with the Nexus 5000 switch. Combining the two technologies provides similar benefits for the traditional FEX architectures only on a larger scale. Today the Nexus 5000 series support up to a maximum of 16 FEX while the Nexus 7000 supports 32 with current code and plans for more in the future. Let’s dig into the details.

First, what are the requirements for FEX support on the Nexus 7000? Three primary requirements must be met:
1. NX-OS 5.1(1) or higher must be installed on the Nexus 7000
2. 32 port M1 10GE modules (part number)
3. EPLD must be current to support VNTag

Once these requirements are met we can connect the FEX to the Nexus 7000. The options supported include traditional 10G Short Reach (SR), 10G Long Reach (LR) optics and Fabric Extender Transceiver (FET) for the M1 32 port card. The M1 32 “L” card add support for active Twinax cables which currently are available in 7 and 10M lengths. In our example, we’ll be using SR optics.


Let’s start by verifying we meet the requirements.
We see below we are running NX-OS 5.1(2) so we’re good to go there.
cmhlab-dc2-sw2-agg1# show ver
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Documents: http://www.cisco.com/en/US/products/ps9372/tsd_products_support_series_home.html
Copyright (c) 2002-2010, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php

Software
BIOS: version 3.22.0
kickstart: version 5.1(2)
system: version 5.1(2)
BIOS compile time: 02/20/10
kickstart image file is: bootflash:///n7000-s1-kickstart.5.1.2.bin
kickstart compile time: 12/25/2020 12:00:00 [12/18/2010 09:55:20]
system image file is: bootflash:///n7000-s1-dk9.5.1.2.bin
system compile time: 11/29/2010 12:00:00 [12/18/2010 11:02:00]


We also have the correct modules installed
cmhlab-dc2-sw2-agg1# show mod
Mod Ports Module-Type Model Status
--- ----- -------------------------------- ------------------ ------------
2 32 10 Gbps Ethernet Module N7K-M132XP-12 ok
3 32 10 Gbps Ethernet XL Module N7K-M132XP-12L ok
4 32 1/10 Gbps Ethernet Module N7K-F132XP-15 ok
5 0 Supervisor module-1X N7K-SUP1 active *
6 0 Supervisor module-1X N7K-SUP1 ha-standby
8 48 10/100/1000 Mbps Ethernet Module N7K-M148GT-11 ok
9 48 1000 Mbps Optical Ethernet Modul N7K-M148GS-11 ok
10 48 10/100/1000 Mbps Ethernet Module N7K-M148GT-11 ok

Now let’s check the EPLD
*NOTE* This must be done from the default VDC and if an EPLD upgrades is required, it is disruptive so plan accordingly.
cmhlab-dc2-sw2-otv1# install all epld bootflash:n7000-s1-epld.5.1.1.img

Compatibility check:
Module Type Upgradable Impact Reason
------ ---- ---------- ---------- ------
2 LC Yes disruptive Module Upgradable
3 LC Yes disruptive Module Upgradable
4 LC Yes disruptive Module Upgradable
5 SUP Yes disruptive Module Upgradable
6 SUP Yes disruptive Module Upgradable
8 LC Yes disruptive Module Upgradable
9 LC Yes disruptive Module Upgradable
10 LC Yes disruptive Module Upgradable
1 Xbar Yes disruptive Module Upgradable
2 Xbar Yes disruptive Module Upgradable
3 Xbar Yes disruptive Module Upgradable
4 Xbar Yes disruptive Module Upgradable
5 Xbar Yes disruptive Module Upgradable
1 FAN Yes disruptive Module Upgradable
2 FAN Yes disruptive Module Upgradable
3 FAN Yes disruptive Module Upgradable
4 FAN Yes disruptive Module Upgradable

Copy complete, now saving to disk (please wait)...
Retrieving EPLD versions... Please wait.

Images will be upgraded according to following table:
Module Type EPLD Running-Version New-Version Upg-Required
------ ---- ------------- --------------- ----------- ------------
2 LC Power Manager 4.008 4.008 No
2 LC IO 1.016 1.016 No
2 LC Forwarding Engine 1.006 1.006 No
2 LC FE Bridge(1) 186.006 186.006 No
2 LC FE Bridge(2) 186.006 186.006 No
2 LC Linksec Engine(1) 2.006 2.006 No
2 LC Linksec Engine(2) 2.006 2.006 No
2 LC Linksec Engine(3) 2.006 2.006 No
2 LC Linksec Engine(4) 2.006 2.006 No
2 LC Linksec Engine(5) 2.006 2.006 No
2 LC Linksec Engine(6) 2.006 2.006 No
2 LC Linksec Engine(7) 2.006 2.006 No
2 LC Linksec Engine(8) 2.006 2.006 No
3 LC Power Manager 4.008 4.008 No
3 LC IO 1.016 1.016 No
3 LC Forwarding Engine 1.006 1.006 No
3 LC FE Bridge(1) 186.006 186.006 No
3 LC Linksec Engine(1) 2.006 2.006 No

4 LC Power Manager 1.000 1.000 No
4 LC IO 0.045 0.045 No
5 SUP Power Manager 3.009 3.009 No
5 SUP IO 3.028 3.028 No
5 SUP Inband 1.008 1.008 No
5 SUP Local Bus CPLD 3.000 3.000 No
5 SUP CMP CPLD 6.000 6.000 No
6 SUP Power Manager 3.009 3.009 No
6 SUP IO 3.028 3.028 No
6 SUP Inband 1.008 1.008 No
6 SUP Local Bus CPLD 3.000 3.000 No
6 SUP CMP CPLD 6.000 6.000 No
8 LC Power Manager 5.006 5.006 No
8 LC IO 2.014 2.014 No
8 LC Forwarding Engine 1.006 1.006 No
9 LC Power Manager 4.008 4.008 No
9 LC IO 1.006 1.006 No
9 LC Forwarding Engine 1.006 1.006 No
9 LC SFP 1.004 1.004 No
10 LC Power Manager 5.006 5.006 No
10 LC IO 2.014 2.014 No
10 LC Forwarding Engine 1.006 1.006 No
1 Xbar Power Manager 2.010 2.010 No
2 Xbar Power Manager 2.010 2.010 No
3 Xbar Power Manager 2.010 2.010 No
4 Xbar Power Manager 2.010 2.010 No
5 Xbar Power Manager 2.010 2.010 No
1 FAN Fan Controller (1) 0.007 0.007 No
1 FAN Fan Controller (2) 0.007 0.007 No
2 FAN Fan Controller (1) 0.007 0.007 No
2 FAN Fan Controller (2) 0.007 0.007 No
3 FAN Fan Controller (1) 0.007 0.007 No
3 FAN Fan Controller (2) 0.007 0.007 No
4 FAN Fan Controller (1) 0.007 0.007 No
4 FAN Fan Controller (2) 0.007 0.007 No
All Modules are up to date.
cmhlab-dc2-sw2-otv1#


So we’re in good shape there, too. It’s like I’ve done this before….. :)
Now that we’re ready, we’ve cabled the FEX to the switch via port e3/1-4 and we’ll be creating a topology that looks like this.






















First, we need to install the FEX feature set. This is a bit different than what we’ve done with features in the past and must be done from the default VDC.
cmhlab-dc2-sw2-otv1# show run | i fex
cmhlab-dc2-sw2-otv1# confi t
Enter configuration commands, one per line. End with CNTL/Z.
cmhlab-dc2-sw2-otv1(config)# install feature-set fex
cmhlab-dc2-sw2-otv1(config)# show run | i fex
install feature-set fex
allow feature-set fex
allow feature-set fex
allow feature-set fex
allow feature-set fex
cmhlab-dc2-sw2-otv1(config)#

Note that each VDC now has a config for allow feature-set fex.
Next, we’ll go to our VDC where we want the FEX configured and get it setup.
cmhlab-dc2-sw2-agg1# confi
Enter configuration commands, one per line. End with CNTL/Z.
cmhlab-dc2-sw2-agg1(config)# feature-set fex

Then we’ll define the FEX and specify the model. While this isn’t required because the FEX will identify itself to the Nexus switch, I think it makes the config more readable and is somewhat self documenting.

cmhlab-dc2-sw2-agg1(config)# fex 150
cmhlab-dc2-sw2-agg1(config-fex)# type n2248T
cmhlab-dc2-sw2-agg1(config-fex)# description FEX150-for-Agg1-VDC

Now we’ll configure the physical ports the FEX is connected into.

cmhlab-dc2-sw2-agg1(config-fex)# int e3/1-4
cmhlab-dc2-sw2-agg1(config-if-range)# desc FEX 150
cmhlab-dc2-sw2-agg1(config-if-range)# switchport
cmhlab-dc2-sw2-agg1(config-if-range)# switchport mode fex-fabric
cmhlab-dc2-sw2-agg1(config-if-range)# fex associate 150
cmhlab-dc2-sw2-agg1(config-if-range)# channel-group 150

Now that we’ve told the switch to treat the ports as fex-fabric ports and created a port channel, let’s bring it up.

cmhlab-dc2-sw2-agg1(config-if-range)# int po150
cmhlab-dc2-sw2-agg1(config-if)# desc Port Channel to FEX 150
cmhlab-dc2-sw2-agg1(config-if)# no shut

cmhlab-dc2-sw2-agg1(config-if)# int e3/1-4
cmhlab-dc2-sw2-agg1(config-if-range)# shut
cmhlab-dc2-sw2-agg1(config-if-range)# no shut
cmhlab-dc2-sw2-agg1(config-if-range)#
cmhlab-dc2-sw2-agg1(config-if-range)# 2011 Feb 12 18:08:23 cmhlab-dc2-sw2-agg1 %FEX-2-NOHMS_ENV_FEX_ONLINE: FEX-150 On-line (Serial Number JAF1440BDFR)


It’s that simple.

cmhlab-dc2-sw2-agg1# show fex
FEX FEX FEX FEX
Number Description State Model Serial
------------------------------------------------------------------------
150 FEX150-for-Agg1-VDC Online N2K-C2248TP-1GE JAF1440BDFR
cmhlab-dc2-sw2-agg1#
cmhlab-dc2-sw2-agg1# show fex 150
FEX: 150 Description: FEX150-for-Agg1-VDC state: Online
FEX version: 5.1(2) [Switch version: 5.1(2)]
Extender Model: N2K-C2248TP-1GE, Extender Serial: JAF1440BDFR
Part No: 73-12748-05
pinning-mode: static Max-links: 1
Fabric port for control traffic: Eth3/1
Fabric interface state:
Po150 - Interface Up. State: Active
Eth3/1 - Interface Up. State: Active
Eth3/2 - Interface Up. State: Active
Eth3/3 - Interface Up. State: Active
Eth3/4 - Interface Up. State: Active
cmhlab-dc2-sw2-agg1#

If we look at the port channel we created, it looks like any other port channel.

cmhlab-dc2-sw2-agg1# show int po150
port-channel150 is up
Hardware: Port-Channel, address: c471.feee.c924 (bia c471.feee.c924)
Description: Port Channel to FEX 150
MTU 1500 bytes, BW 40000000 Kbit, DLY 10 usec
reliability 255/255, txload 1/255, rxload 1/255
Encapsulation ARPA
Port mode is fex-fabric
full-duplex, 10 Gb/s
Input flow-control is off, output flow-control is off
Switchport monitor is off
EtherType is 0x8100
Members in this channel: Eth3/1, Eth3/2, Eth3/3, Eth3/4
Last clearing of "show interface" counters never
30 seconds input rate 124432 bits/sec, 12 packets/sec
30 seconds output rate 23272 bits/sec, 20 packets/sec

With the FEX being on-line, we now have 48 additional interfaces available to configure.

cmhlab-dc2-sw2-agg1# show int brief | i 150
mgmt0 -- up 10.0.2.13 1000 1500
Eth3/1 1 eth fabric up none 10G(S) 150
Eth3/2 1 eth fabric up none 10G(S) 150
Eth3/3 1 eth fabric up none 10G(S) 150
Eth3/4 1 eth fabric up none 10G(S) 150
Po150 1 eth fabric up none a-10G(S) none
Eth150/1/1 1 eth access down Administratively down auto(D) --
Eth150/1/2 1 eth access down Administratively down auto(D) --
Eth150/1/3 1 eth access down Administratively down auto(D) --
Eth150/1/4 1 eth access down Administratively down auto(D) --
Eth150/1/5 1 eth access down Administratively down auto(D) --
Eth150/1/6 1 eth access down Administratively down auto(D) --
Eth150/1/7 1 eth access down Administratively down auto(D) --
Eth150/1/8 1 eth access down Administratively down auto(D) --
Eth150/1/9 1 eth access down Administratively down auto(D) --
Eth150/1/10 1 eth access down Administratively down auto(D) --
Eth150/1/11 1 eth access down Administratively down auto(D) --
Eth150/1/12 1 eth access down Administratively down auto(D) --
Eth150/1/13 1 eth access down Administratively down auto(D) --
Eth150/1/14 1 eth access down Administratively down auto(D) --
Eth150/1/15 1 eth access down Administratively down auto(D) --
Eth150/1/16 1 eth access down Administratively down auto(D) --
Eth150/1/17 1 eth access down Administratively down auto(D) --
Eth150/1/18 1 eth access down Administratively down auto(D) --
Eth150/1/19 1 eth access down Administratively down auto(D) --
Eth150/1/20 1 eth access down Administratively down auto(D) --
Eth150/1/21 1 eth access down Administratively down auto(D) --
Eth150/1/22 1 eth access down Administratively down auto(D) --
Eth150/1/23 1 eth access down Administratively down auto(D) --
Eth150/1/24 1 eth access down Administratively down auto(D) --
Eth150/1/25 1 eth access down Administratively down auto(D) --
Eth150/1/26 1 eth access down Administratively down auto(D) --
Eth150/1/27 1 eth access down Administratively down auto(D) --
Eth150/1/28 1 eth access down Administratively down auto(D) --
Eth150/1/29 1 eth access down Administratively down auto(D) --
Eth150/1/30 1 eth access down Administratively down auto(D) --
Eth150/1/31 1 eth access down Administratively down auto(D) --
Eth150/1/32 1 eth access down Administratively down auto(D) --
Eth150/1/33 1 eth access down Administratively down auto(D) --
Eth150/1/34 1 eth access down Administratively down auto(D) --
Eth150/1/35 1 eth access down Administratively down auto(D) --
Eth150/1/36 1 eth access down Administratively down auto(D) --
Eth150/1/37 1 eth access down Administratively down auto(D) --
Eth150/1/38 1 eth access down Administratively down auto(D) --
Eth150/1/39 1 eth access down Administratively down auto(D) --
Eth150/1/40 1 eth access down Administratively down auto(D) --
Eth150/1/41 1 eth access down Administratively down auto(D) --
Eth150/1/42 1 eth access down Administratively down auto(D) --
Eth150/1/43 1 eth access down Administratively down auto(D) --
Eth150/1/44 1 eth access down Administratively down auto(D) --
Eth150/1/45 1 eth access down Administratively down auto(D) --
Eth150/1/46 1 eth access down Administratively down auto(D) --
Eth150/1/47 1 eth access down Administratively down auto(D) --
Eth150/1/48 1 eth access down Administratively down auto(D) --
cmhlab-dc2-sw2-agg1#


Note that today we cannot have a FEX multi-homed into two Nexus 7000s like we can on the Nexus 5000. Look for that capability in a future release along with support for additional FEX platforms.

When you think of the scale – 32 FEX x 48 ports = 1,536, that’s pretty impressive. Being able to take advantage of the cable savings with localized, in –rack cabling without the challenges of increased STP diameter, the FEX and Nexus 7000 make a powerful impact on the data center topology.

As always, I welcome your comments and feedback.