Over the past two years we have made a ton of progress shifting datacenter infrastructure from 1G to 10G+. A majority of this has been through a vendor migration back to Cisco for switching – and specifically using the Nexus 9372 line. These boxes come with 48 ports of 1G/10G SFP+ and another 6 QSFP ports that hit 40G.
Late last year we placed an order to expand our 10G+ coverage in one of our larger datacenters. After meeting with our local Cisco reps and talking through options, we settled on a pair of Nexus 93180YC-EX switches. The new toys offer additional flexibility, by providing 48 SFP+ ports capable of 1G/10G/25G and the 6 QSFP ports are 40/100G.
A week or two ago we worked during a planned maintenance window to try and bring the new 93180s online. The new switches are just directly connected back to the 9372s using four QSFP-40G-CR4 cables. The time comes, we turn up the ports …. and they don’t come up. We know the cable types definitely work, since we’re using them for all of our current interconnects between the 9372s. Unfortunately, due to tight timelines on maintenance windows – we have to turn down the ports and move on to other task.
So we go down the normal line of troubleshooting. Reseat cables – still nothing. Remove port-channel/VPC configurations – nothing. Test the QSFP cables by cabling in between just the new 93180s – yeah, ports come up and the cables are good. One of my teammates, who is running with this task, is almost at the point of opening up a support case with TAC. I double checked the switch port configurations – but everything looks good. My first thought was that maybe there is a speed/autonegotiation issue – since the QSFP ports on the 9372s are fixed 40G, while the 93180s are 40/100G.
We scheduled another quick no-downtime maintenance window to test out the theory. Each of the ports on both sides of the connection gets the following configuration changes:
Switch(config)# interface x/x
Switch(config-if)# no negotiate auto
Switch(config-if)# duplex full
Switch(config-if)# speed 40000
The time comes – and sure enough the ports come online.
Just wanted to throw this out there in case anyone else runs across the same problem. The fix is surely easy enough, but you don’t always think of autonegotiation issues – especially in such a simplistic configuration as this.
I also wanted to say thanks to the great people in the #CiscoChampions DataCenter group. I was able to run the problem through them, and they suggested the same potential root cause. It’s always great to have a second opinion to provide some confidence, especially when there are strict time constraints for maintenance.