How many times have you had to fill out a change control document to upgrade code on your network devices where you've detailed the redundancy, portions of the networks impacted, application owners notified only to have it rejected due to "impact"? Prior to my current job at Cisco, this was a common theme. I wished I had a device that would let me roll code without impacting traffic. Fast forward a few years and my wishes have come true with In Service Software Upgrade (ISSU) within NX-OS.
A brief history lesson - Storage switches have had this capability for a long time in the higher end platforms that are considered director class. It makes sense to have ISSU functionality on fibre channel switches because fibre channel as a protocol relies on the network to guarantee delivery of frames. Dropping frames means bad things for storage traffic. Moving the capability for ISSU to Ethernet/IP networks makes sense in a modern data center where high density virtualization and the "always on" mindset prevail. Networking teams have been clamoring for ISSU for a long time. Let's face it, rolling code isn't one of the more exciting things to do on a network, but it's a necessary function, good news is that we now have it.
We'll focus on ISSU on the Nexus series of devices though know that other products in Cisco's portfolio support it. To provide a hitless upgrade capability the device and software require an intrinsic separation of the control plane and data plane. This allows changes to be made in the control plane, like software version, without affecting the data plane, through which the packets and frames that traverse the device pass. NX-OS has been engineered from day one to have this separation of planes. Coupling it with years of experience in ISSU on the Cisco MDS and one of my most favorite features of NX-OS is born.
So enough talk, let's get into the action. To start an ISSU we use the install all command as shown below where we specify the kickstart image and system image to use.
cmhlab-dc2-sw2-otv1# install all kick bootflash:n7000-s1-kickstart.5.1.2.bin system bootflash:n7000-s1-dk9.5.1.2.bin
During the process the install happens before your eyes, which is great for the paranoid amongst us. J
Various components are extracted from the kickstart and system files, and verified to minimize the potential for corruption. Below is a sample of the output.
Verifying image bootflash:/n7000-s1-kickstart.5.1.2.bin for boot variable "kickstart".
[####################] 100% -- SUCCESS
Verifying image bootflash:/n7000-s1-dk9.5.1.2.bin for boot variable "system".
[####################] 100% -- SUCCESS
Verifying image type.
[####################] 100% -- SUCCESS
Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS
Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS
Extracting "bios" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS
Extracting "system" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS
Extracting "kickstart" version from image bootflash:/n7000-s1-kickstart.5.1.2.bin.
[####################] 100% -- SUCCESS
Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS
Extracting "lc1n7k" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS
Extracting "cmp" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS
Extracting "cmp-bios" version from image bootflash:/n7000-s1-dk9.5.1.2.bin.
[####################] 100% -- SUCCESS
Performing module support checks
[####################] 100% -- SUCCESS
Notifying services about system upgrade.
[####################] 100% -- SUCCESS
Once that is completed, the install routine also shows the type of upgrade per module, reflecting a rolling upgrade for line cards and reset for the supervisors. Rolling upgrades are non-disruptive as the modules have been engineered to provide this functionality and not drop link to ports or disrupt switching.
Compatibility check is done:
Module bootable Impact Install-type Reason
------ -------- -------------- ------------ ------
2 yes non-disruptive rolling
5 yes non-disruptive reset
6 yes non-disruptive reset
9 yes non-disruptive rolling
Finally, a nice table is presented showing the details of the upgrade and waits for the green light to continue.
Of course we want to proceed and then we see this output.
Install is in progress, please wait.
Performing runtime checks.
[####################] 100% -- SUCCESS
Syncing image bootflash:/n7000-s1-kickstart.5.1.2.bin to standby.
[####################] 100% -- SUCCESS
Syncing image bootflash:/n7000-s1-dk9.5.1.2.bin to standby.
[####################] 100% -- SUCCESS
*NOTE* The install routine automatically copies the files to the redundant supervisor for you.
Setting boot variables.
[####################] 100% -- SUCCESS
Performing configuration copy.
[####################] 100% -- SUCCESS
Module 2: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS
Module 5: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS
Module 6: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS
Module 9: Refreshing compact flash and upgrading bios/loader/bootrom.
Warning: please do not remove or power off the module at this time.
[####################] 100% -- SUCCESS
Module 6: Waiting for module online.
-- SUCCESS
Notifying services about the switchover.
[####################] 100% -- SUCCESS
"Switching over onto standby".
Connection closed by foreign host.
At this point, the supervisor that was the secondary (module 6 in my example) has reload and come up with the new code. This triggers the primary to initiate a Stateful Switch Over (SSO) to the new code running in the control plane. Meanwhile, data is still traversing the switch with no impact. J
Since our telnet session was disconnected during the SSO (telnet isn't SSO aware), we need to re-establish the session and issue a command to continue monitoring the upgrade.
rfuller@cmhlab-tools:~$ telnet cmhlab-dc2-sw2-otv1
Trying 10.2.0.4...
Connected to cmhlab-dc2-sw2-otv1.csc.dublin.cisco.com.
Escape character is '^]'.
User Access Verification
login: admin
Password:
Cisco Nexus Operating System (NX-OS) Software
TAC support: http://www.cisco.com/tac
Copyright (c) 2002-2010, Cisco Systems, Inc. All rights reserved.
The copyrights to certain works contained in this software are
owned by other third parties and used and distributed under
license. Certain components of this software are licensed under
the GNU General Public License (GPL) version 2.0 or the GNU
Lesser General Public License (LGPL) Version 2.1. A copy of each
such license is available at
http://www.opensource.org/licenses/gpl-2.0.php and
http://www.opensource.org/licenses/lgpl-2.1.php
cmhlab-dc2-sw2-otv1# show install all status
There is an on-going installation...
Enter Ctrl-C to go back to the prompt.
Continuing with installation, please wait
Trying to start the installer...
Module 6: Waiting for module online.
-- SUCCESS
2011 Jan 24 02:34:55 cmhlab-dc2-sw2-otv1 %IDEHSD-STANDBY-2-MOUNT: slot0: online
2011 Jan 24 02:35:06 cmhlab-dc2-sw2-otv1 %CMPPROXY-STANDBY-2-LOG_CMP_UP: Connectivity Management processor(on module 5) is now UP
2011 Jan 24 02:37:55 cmhlab-dc2-sw2-otv1 %IDEHSD-STANDBY-2-MOUNT: logflash: online
Module 2: Non-disruptive upgrading.
-- SUCCESS
Module 9: Non-disruptive upgrading.
-- SUCCESS
Install has been successful.
With that, we've upgraded our NX-OS, had the system automatically copy the files to the right locations, modify the boot values and didn't drop a frame. How's that for hot?
cmhlab-dc2-sw2-otv1# show ver i uptime
Kernel uptime is 0 day(s), 0 hour(s), 26 minute(s), 50 second(s)
*NOTE* The Kernel has been up for just a while but we'll see that the overall system has been up much longer
cmhlab-dc2-sw2-otv1# show ver i version
the GNU General Public License (GPL) version 2.0 or the GNU
BIOS: version 3.22.0
kickstart: version 5.1(2)
system: version 5.1(2)
cmhlab-dc2-sw2-otv1# show system uptime
System start time: Tue Oct 26 19:46:38 2010
System uptime: 89 days, 6 hours, 56 minutes, 26 seconds
Kernel uptime: 0 days, 0 hours, 29 minutes, 16 seconds
Active supervisor uptime: 0 days, 0 hours, 19 minutes, 56 seconds
cmhlab-dc2-sw2-otv1#
We'll cover Nexus 5000 and Nexus 1000v and ISSU in the future. Hope it was informative.