Sunday, January 23, 2011

The Joys of ISSU on Nexus 7000

How many times have you had to fill out a change control document to upgrade code on your network devices where you've detailed the redundancy, portions of the networks impacted, application owners notified only to have it rejected due to "impact"? Prior to my current job at Cisco, this was a common theme. I wished I had a device that would let me roll code without impacting traffic. Fast forward a few years and my wishes have come true with In Service Software Upgrade (ISSU) within NX-OS.


A brief history lesson - Storage switches have had this capability for a long time in the higher end platforms that are considered director class. It makes sense to have ISSU functionality on fibre channel switches because fibre channel as a protocol relies on the network to guarantee delivery of frames. Dropping frames means bad things for storage traffic. Moving the capability for ISSU to Ethernet/IP networks makes sense in a modern data center where high density virtualization and the "always on" mindset prevail. Networking teams have been clamoring for ISSU for a long time. Let's face it, rolling code isn't one of the more exciting things to do on a network, but it's a necessary function, good news is that we now have it.


We'll focus on ISSU on the Nexus series of devices though know that other products in Cisco's portfolio support it. To provide a hitless upgrade capability the device and software require an intrinsic separation of the control plane and data plane. This allows changes to be made in the control plane, like software version, without affecting the data plane, through which the packets and frames that traverse the device pass. NX-OS has been engineered from day one to have this separation of planes. Coupling it with years of experience in ISSU on the Cisco MDS and one of my most favorite features of NX-OS is born.


So enough talk, let's get into the action. To start an ISSU we use the install all command as shown below where we specify the kickstart image and system image to use.


cmhlab-dc2-sw2-otv1# install all kick bootflash:n7000-s1-kickstart.5.1.2.bin system bootflash:n7000-s1-dk9.5.1.2.bin


During the process the install happens before your eyes, which is great for the paranoid amongst us. J


Various components are extracted from the kickstart and system files, and verified to minimize the potential for corruption. Below is a sample of the output.

Verifying image bootflash:/n7000-s1-kickstart.5.1.2.bin for boot variable "kickstart".

[####################] 100% -- SUCCESS

Verifying image bootflash:/n7000-s1-dk9.5.1.2.bin for boot variable "system".
[####################] 100% -- SUCCESS

Verifying image type.

[####################] 100% -- SUCCESS

Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS

Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.

[####################] 100% -- SUCCESS

Extracting "bios" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS

Extracting "system" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS

Extracting "kickstart" version from image bootflash:/n7000-s1-kickstart.5.1.2.bin.

[####################] 100% -- SUCCESS

Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS

Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS

Extracting "cmp" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS

Extracting "cmp-bios" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.

[####################] 100% -- SUCCESS

Performing module support checks
[####################] 100% -- SUCCESS

Notifying services about system upgrade.

[####################] 100% -- SUCCESS

Once that is completed, the install routine also shows the type of upgrade per module, reflecting a rolling upgrade for line cards and reset for the supervisors. Rolling upgrades are non-disruptive as the modules have been engineered to provide this functionality and not drop link to ports or disrupt switching.


Compatibility check is done:


Module bootable Impact Install-type Reason

------ -------- -------------- ------------ ------

2 yes non-disruptive rolling

5 yes non-disruptive reset

6 yes non-disruptive reset

9 yes non-disruptive rolling


Finally, a nice table is presented showing the details of the upgrade and waits for the green light to continue.




Of course we want to proceed and then we see this output.


Install is in progress, please wait.

Performing runtime checks.

[####################] 100% -- SUCCESS

Syncing image bootflash:/n7000-s1-kickstart.5.1.2.bin to standby.

[####################] 100% -- SUCCESS

Syncing image bootflash:/n7000-s1-dk9.5.1.2.bin to standby.
[####################] 100% -- SUCCESS

*NOTE* The install routine automatically copies the files to the redundant supervisor for you.

Setting boot variables.
[####################] 100% -- SUCCESS

Performing configuration copy.
[####################] 100% -- SUCCESS

Module 2: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 5: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 6: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 9: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS

Module 6: Waiting for module online.
-- SUCCESS
Notifying services about the switchover.
[####################] 100% -- SUCCESS
"Switching over onto standby".
Connection closed by foreign host.

At this point, the supervisor that was the secondary (module 6 in my example) has reload and come up with the new code. This triggers the primary to initiate a Stateful Switch Over (SSO) to the new code running in the control plane. Meanwhile, data is still traversing the switch with no impact. J


Since our telnet session was disconnected during the SSO (telnet isn't SSO aware), we need to re-establish the session and issue a command to continue monitoring the upgrade.


rfuller@cmhlab-tools:~$ telnet cmhlab-dc2-sw2-otv1

Trying 10.2.0.4...

Connected to cmhlab-dc2-sw2-otv1.csc.dublin.cisco.com.

Escape character is '^]'.

User Access Verification
login: admin
Password:
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Copyright (c) 2002-2010, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php

cmhlab-dc2-sw2-otv1# show install all status
There is an on-going installation...
Enter Ctrl-C to go back to the prompt.
Continuing with installation, please wait

Trying to start the installer...
Module 6: Waiting for module online.
-- SUCCESS
2011 Jan 24 02:34:55 cmhlab-dc2-sw2-otv1 %IDEHSD-STANDBY-2-MOUNT: slot0: online
2011 Jan 24 02:35:06 cmhlab-dc2-sw2-otv1 %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 5) is now UP
2011 Jan 24 02:37:55 cmhlab-dc2-sw2-otv1 %IDEHSD-STANDBY-2-MOUNT: logflash: online

Module 2: Non-disruptive upgrading.
-- SUCCESS
Module 9: Non-disruptive upgrading.
-- SUCCESS
Install has been successful.
With that, we've upgraded our NX-OS, had the system automatically copy the files to the right locations, modify the boot values and didn't drop a frame. How's that for hot?

cmhlab-dc2-sw2-otv1# show ver i uptime


Kernel uptime is 0 day(s), 0 hour(s), 26 minute(s), 50 second(s)


*NOTE* The Kernel has been up for just a while but we'll see that the overall system has been up much longer


cmhlab-dc2-sw2-otv1# show ver i version

the GNU General Public License (GPL) version 2.0 or the GNU

BIOS: version 3.22.0
kickstart: version 5.1(2)
system: version 5.1(2)

cmhlab-dc2-sw2-otv1# show system uptime
System start time: Tue Oct 26 19:46:38 2010
System uptime: 89 days, 6 hours, 56 minutes, 26 seconds
Kernel uptime: 0 days, 0 hours, 29 minutes, 16 seconds
Active supervisor uptime: 0 days, 0 hours, 19 minutes, 56 seconds

cmhlab-dc2-sw2-otv1#

We'll cover Nexus 5000 and Nexus 1000v and ISSU in the future. Hope it was informative.

Tuesday, January 18, 2011

Here we go.....

I finally decided I needed to do some blogging, so here we go. Before we get into the fun stuff, let's talk a bit about who I am. This will help you decide if you are in the right place or not.

My name is Ron Fuller and I work as a Technology Solutions Architect with Cisco in Dublin, Ohio. I work with our Enterprise customers on data center architecture, which means I'm not a product guy per se. Architectures can be enabled by a product or suite of products though I happen to think some enable it better than others. ;) I am a dual CCIE #5851 (Routing and Switching and Storage Networking) and have held a myriad of certification from other vendors including Novell - where I started my certification track and was a Master CNE, VMware, SNIA, Microsoft, HP, Okidata, IBM, ISC2, CompTIA and more. Certifications have been a focal point for me early in my career and certainly opened doors that would have otherwise remained closed in tough times.

I have had the opportunity to be published a few times and my latest effort was a collaboration with two great guys who I am lucky to call friends as well, David Jansen and Kevin Corbin. We created NX-OS and Cisco Nexus Switching: Next-Generation Data Center Architectures with CiscoPress. The book was released last June and we're already working on a 2nd Edition because of the many changes and innovations NX-OS has brought to market in the last few months and those coming! I have a passion for NX-OS and if you've been following me on Twitter (@ccie5851) you might have picked up on it. ;) I have a sticker on my laptop that says it all.



On a personal front, my wife and I have four awesome, smart, creative, cute....you get the picture...kids. We live north of Columbus OH and love to travel- WITH the kids - especially if there is a F1 race involved. We've become very adept at long haul travel with kids and have taken them with us to Japan, England, France, Germany, Australia and our last big adventure, China. I may blog about the science of traveling with little ones in the future. We think we've got a good system but may be biased.

As I mentioned earlier, F1 is a great excuse to travel and for that matter, I'm a fan of most autosports though F1 holds a special place in my heart. It is the perfect integration of technology (I'm a geek after all!) and speed, exotic locations and competition. I do watch Indycar and it's probably best to say I monitor NASCAR. NASCAR has so many races and they are so long that it becomes quite the commitment to actually WATCH every race. I still miss the days of Dale and Rusty beating and banging on each other, but as with all things, change happens.

I'm sure more of my idiosyncrasies will emerge as I write, but know that I plan to discuss NX-OS and Nexus switching, some UCS action, MDS and whatever else comes up. Its an exciting time in the Data Center space and I couldn't be happier to be hip-deep in the action!

Thanks for taking the time and see you around.