SFP ports flap after RSTP topology update

DOWNLOAD THE LATEST FIRMWARE HERE
salad
Member
 
Posts: 5
Joined: Wed Sep 26, 2018 10:00 pm
Has thanked: 3 times
Been thanked: 0 time

SFP ports flap after RSTP topology update

Wed Apr 15, 2020 8:40 pm

Howdy,

We run a bunch of WS-12-250-DCs in our network. We standardized on 1.5.1rc6 as it was GA at the time we rolled out, and honestly, these switches have been ROCK solid since! Most of them are installed in top-of-tower timing cabinets with PacketFlux gear to Cambium Canopy equipment. Most of our towers have few enough APs that a single 12-port unit is enough to do everything. We use fiber down the tower to get to whatever is at the base.

The tower base equipment is usually a Juniper switch or a small Cisco router. We run VSTP/Per-VLAN STP on the other gear. Our Netonixes are set up in RSTP mode with RSTP disabled on the port facing the tower base switch to avoid problems arising from having the different types of BPDUs in use.


Today we had a first, which was to add a second one of these switches to a tower. This tower is running a Juniper switch at the base with VSTP on all VLANs and ports. Like others, the Netonix at the top has RSTP disabled on the port going to the Juniper. RSTP on the box otherwise looked sane. The second WS-12-250-DC is basically a clone of the first. They are connected togeter on port 11. The first switch is RSTP root as it has a lower bridge ID.

Upon connecting the second switch, the two started going nuts:


Code: Select all
Dec 31 19:02:51 Port: link state changed to 'up' (1G) on port 11
Dec 31 19:02:51 STP: msti 0 set port 11 to discarding
Dec 31 19:02:51 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Dec 31 19:02:52 STP: msti 0 set port 11 to learning
Dec 31 19:02:52 STP: msti 0 set port 11 to forwarding
Dec 31 19:02:57 system: starting ntpclient
Apr 15 11:24:43 system: time set by NTP server
Apr 15 11:25:08 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 11:25:19 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 11:25:30 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 11:26:04 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 11:27:10 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 11:28:05 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 11:29:14 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC


The New Root messages continued on that way for literally hours until I found a fix.

Unfortunately syslog doesn't seem to be working on the first switch, so I have only the internal log to go by. I'm assuming that this was happening since the same time:

Code: Select all
Apr 15 16:38:00 monitor: restarting vtss_appl
Apr 15 16:38:01 STP: msti 0 set port 1 to discarding
Apr 15 16:38:01 STP: msti 0 set port 2 to discarding
Apr 15 16:38:01 STP: msti 0 set port 3 to discarding
Apr 15 16:38:01 STP: msti 0 set port 4 to discarding
Apr 15 16:38:01 STP: msti 0 set port 6 to discarding
Apr 15 16:38:01 STP: msti 0 set port 7 to discarding
Apr 15 16:38:01 STP: msti 0 set port 8 to discarding
Apr 15 16:38:02 STP: msti 0 set port 9 to discarding
Apr 15 16:38:02 STP: msti 0 set port 10 to discarding
Apr 15 16:38:02 STP: msti 0 set port 11 to discarding
Apr 15 16:38:02 STP: msti 0 set port 13 to discarding
Apr 15 16:38:02 STP: msti 0 set port 11 to learning
Apr 15 16:38:02 STP: msti 0 set port 11 to forwarding
Apr 15 16:38:04 Port: link state changed to 'down' on port 13
Apr 15 16:38:11 monitor: restarting vtss_appl
Apr 15 16:38:12 STP: msti 0 set port 1 to discarding
Apr 15 16:38:12 STP: msti 0 set port 2 to discarding
Apr 15 16:38:12 STP: msti 0 set port 3 to discarding
Apr 15 16:38:12 STP: msti 0 set port 4 to discarding
Apr 15 16:38:12 STP: msti 0 set port 6 to discarding
Apr 15 16:38:13 STP: msti 0 set port 7 to discarding
Apr 15 16:38:13 STP: msti 0 set port 8 to discarding
Apr 15 16:38:13 STP: msti 0 set port 9 to discarding
Apr 15 16:38:13 STP: msti 0 set port 10 to discarding
Apr 15 16:38:13 STP: msti 0 set port 11 to discarding
Apr 15 16:38:13 STP: msti 0 set port 13 to discarding
Apr 15 16:38:14 STP: msti 0 set port 11 to learning
Apr 15 16:38:14 STP: msti 0 set port 11 to forwarding
Apr 15 16:38:15 Port: link state changed to 'down' on port 13
Apr 15 16:38:23 monitor: restarting vtss_appl
Apr 15 16:38:24 STP: msti 0 set port 1 to discarding
Apr 15 16:38:24 STP: msti 0 set port 2 to discarding
Apr 15 16:38:24 STP: msti 0 set port 3 to discarding
Apr 15 16:38:24 STP: msti 0 set port 4 to discarding
Apr 15 16:38:24 STP: msti 0 set port 6 to discarding
Apr 15 16:38:24 STP: msti 0 set port 7 to discarding
Apr 15 16:38:24 STP: msti 0 set port 8 to discarding
Apr 15 16:38:24 STP: msti 0 set port 9 to discarding
Apr 15 16:38:24 STP: msti 0 set port 10 to discarding
Apr 15 16:38:24 STP: msti 0 set port 11 to discarding
Apr 15 16:38:25 STP: msti 0 set port 13 to discarding
Apr 15 16:38:25 STP: msti 0 set port 11 to learning
Apr 15 16:38:25 STP: msti 0 set port 11 to forwarding
Apr 15 16:38:26 Port: link state changed to 'down' on port 13
Apr 15 16:38:26 Port: link state changed to 'down' on port 14
Apr 15 16:38:34 monitor: restarting vtss_appl
Apr 15 16:38:35 STP: msti 0 set port 1 to discarding
Apr 15 16:38:35 STP: msti 0 set port 2 to discarding
Apr 15 16:38:35 STP: msti 0 set port 3 to discarding
Apr 15 16:38:35 STP: msti 0 set port 4 to discarding
Apr 15 16:38:35 STP: msti 0 set port 6 to discarding
Apr 15 16:38:35 STP: msti 0 set port 7 to discarding
Apr 15 16:38:35 STP: msti 0 set port 8 to discarding
Apr 15 16:38:35 STP: msti 0 set port 9 to discarding
Apr 15 16:38:36 STP: msti 0 set port 10 to discarding
Apr 15 16:38:36 STP: msti 0 set port 11 to discarding
Apr 15 16:38:36 STP: msti 0 set port 13 to discarding
Apr 15 16:38:36 STP: msti 0 set port 11 to learning
Apr 15 16:38:36 STP: msti 0 set port 11 to forwarding
Apr 15 16:38:37 Port: link state changed to 'down' on port 13



...and so on. Apparently an endless loop of bouncing STP on all ports, vtss_appl restarting, and the SFP ports flapping.

I'd like to point out the flapping of the SFP ports, 13 and 14. 13 was a copper SFP connected to a Canopy AP that's been there for some time. I checked to make sure we didn't accidentally have a dual-homed business customer connected, and it turns out the AP actually has no subscribers on it. Port 14 is the uplink to the Juniper switch.

I actually misdiagnosed this as an SFP or cable issue from the Juniper's logs as we had a few other unforeseen issues during this maintenance activity. At 17:25 I changed the Juniper's config to disable VSTP on the port up to the Netonix entirely. I did this to get rid of the learning phase so our customers would weather the problem better, since there's no change of an L2 loop between the two switches.

This had no immediate effect. After 14 minutes, the flapping mysteriously stopped on switch #1

Code: Select all
Apr 15 17:35:06 Port: link state changed to 'down' on port 13
Apr 15 17:35:06 Port: link state changed to 'down' on port 14
Apr 15 17:35:07 STP: msti 0 set port 10 to learning
Apr 15 17:35:07 Port: link state changed to 'up' (1G) on port 14
Apr 15 17:35:07 STP: msti 0 set port 10 to forwarding
Apr 15 17:35:07 STP: msti 0 set port 9 to learning
Apr 15 17:35:07 STP: msti 0 set port 9 to forwarding
Apr 15 17:35:07 STP: msti 0 set port 8 to learning
Apr 15 17:35:07 STP: msti 0 set port 8 to forwarding
Apr 15 17:35:07 STP: msti 0 set port 7 to learning
Apr 15 17:35:07 STP: msti 0 set port 7 to forwarding
Apr 15 17:35:07 STP: msti 0 set port 6 to learning
Apr 15 17:35:07 STP: msti 0 set port 6 to forwarding
Apr 15 17:35:07 STP: msti 0 set port 4 to learning
Apr 15 17:35:08 STP: msti 0 set port 4 to forwarding
Apr 15 17:35:08 STP: msti 0 set port 3 to learning
Apr 15 17:35:08 STP: msti 0 set port 3 to forwarding
Apr 15 17:35:08 STP: msti 0 set port 2 to learning
Apr 15 17:35:08 STP: msti 0 set port 2 to forwarding
Apr 15 17:35:08 STP: msti 0 set port 1 to learning
Apr 15 17:35:08 STP: msti 0 set port 1 to forwarding
Apr 15 17:35:09 Port: link state changed to 'up' (1G) on port 13
Apr 15 17:35:09 STP: msti 0 set port 13 to discarding
Apr 15 17:35:12 STP: msti 0 set port 13 to learning
Apr 15 17:35:12 STP: msti 0 set port 13 to forwarding
Apr 15 17:35:58 monitor: restarting vtss_appl
Apr 15 17:35:59 STP: msti 0 set port 1 to discarding
Apr 15 17:35:59 STP: msti 0 set port 2 to discarding
Apr 15 17:35:59 STP: msti 0 set port 3 to discarding
Apr 15 17:35:59 STP: msti 0 set port 4 to discarding
Apr 15 17:35:59 STP: msti 0 set port 6 to discarding
Apr 15 17:35:59 STP: msti 0 set port 7 to discarding
Apr 15 17:36:00 STP: msti 0 set port 8 to discarding
Apr 15 17:36:00 STP: msti 0 set port 9 to discarding
Apr 15 17:36:00 STP: msti 0 set port 10 to discarding
Apr 15 17:36:00 STP: msti 0 set port 11 to discarding
Apr 15 17:36:00 STP: msti 0 set port 13 to discarding
Apr 15 17:36:00 STP: msti 0 set port 11 to learning
Apr 15 17:36:00 STP: msti 0 set port 11 to forwarding
Apr 15 17:36:01 Port: link state changed to 'down' on port 13
Apr 15 17:36:02 STP: msti 0 set port 10 to learning
Apr 15 17:36:03 STP: msti 0 set port 10 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 9 to learning
Apr 15 17:36:03 STP: msti 0 set port 9 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 8 to learning
Apr 15 17:36:03 STP: msti 0 set port 8 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 7 to learning
Apr 15 17:36:03 STP: msti 0 set port 7 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 6 to learning
Apr 15 17:36:03 STP: msti 0 set port 6 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 4 to learning
Apr 15 17:36:03 STP: msti 0 set port 4 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 3 to learning
Apr 15 17:36:03 STP: msti 0 set port 3 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 2 to learning
Apr 15 17:36:03 STP: msti 0 set port 2 to forwarding
Apr 15 17:36:03 STP: msti 0 set port 1 to learning
Apr 15 17:36:03 STP: msti 0 set port 1 to forwarding
Apr 15 17:36:04 Port: link state changed to 'up' (1G) on port 13
Apr 15 17:36:04 STP: msti 0 set port 13 to discarding
Apr 15 17:36:07 STP: msti 0 set port 13 to learning
Apr 15 17:36:07 STP: msti 0 set port 13 to forwarding
Apr 15 17:37:05 monitor: restarting vtss_appl
Apr 15 17:37:06 STP: msti 0 set port 1 to discarding
Apr 15 17:37:06 STP: msti 0 set port 2 to discarding
Apr 15 17:37:06 STP: msti 0 set port 3 to discarding
Apr 15 17:37:06 STP: msti 0 set port 4 to discarding
Apr 15 17:37:06 STP: msti 0 set port 6 to discarding
Apr 15 17:37:06 STP: msti 0 set port 7 to discarding
Apr 15 17:37:06 STP: msti 0 set port 8 to discarding
Apr 15 17:37:06 STP: msti 0 set port 9 to discarding
Apr 15 17:37:06 STP: msti 0 set port 10 to discarding
Apr 15 17:37:06 STP: msti 0 set port 11 to discarding
Apr 15 17:37:07 STP: msti 0 set port 13 to discarding
Apr 15 17:37:07 STP: msti 0 set port 11 to learning
Apr 15 17:37:07 STP: msti 0 set port 11 to forwarding
Apr 15 17:37:08 Port: link state changed to 'down' on port 13
Apr 15 17:37:09 STP: msti 0 set port 10 to learning
Apr 15 17:37:09 STP: msti 0 set port 10 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 9 to learning
Apr 15 17:37:09 STP: msti 0 set port 9 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 8 to learning
Apr 15 17:37:09 STP: msti 0 set port 8 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 7 to learning
Apr 15 17:37:09 STP: msti 0 set port 7 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 6 to learning
Apr 15 17:37:09 STP: msti 0 set port 6 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 4 to learning
Apr 15 17:37:09 STP: msti 0 set port 4 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 3 to learning
Apr 15 17:37:09 STP: msti 0 set port 3 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 2 to learning
Apr 15 17:37:09 STP: msti 0 set port 2 to forwarding
Apr 15 17:37:09 STP: msti 0 set port 1 to learning
Apr 15 17:37:09 STP: msti 0 set port 1 to forwarding
Apr 15 17:37:11 Port: link state changed to 'up' (1G) on port 13
Apr 15 17:37:11 STP: msti 0 set port 13 to discarding
Apr 15 17:37:13 STP: msti 0 set port 13 to learning
Apr 15 17:37:14 STP: msti 0 set port 13 to forwarding
Apr 15 17:38:00 monitor: restarting vtss_appl
Apr 15 17:38:01 STP: msti 0 set port 1 to discarding
Apr 15 17:38:01 STP: msti 0 set port 2 to discarding
Apr 15 17:38:01 STP: msti 0 set port 3 to discarding
Apr 15 17:38:01 STP: msti 0 set port 4 to discarding
Apr 15 17:38:01 STP: msti 0 set port 6 to discarding
Apr 15 17:38:01 STP: msti 0 set port 7 to discarding
Apr 15 17:38:01 STP: msti 0 set port 8 to discarding
Apr 15 17:38:02 STP: msti 0 set port 9 to discarding
Apr 15 17:38:02 STP: msti 0 set port 10 to discarding
Apr 15 17:38:02 STP: msti 0 set port 11 to discarding
Apr 15 17:38:02 STP: msti 0 set port 13 to discarding
Apr 15 17:38:02 STP: msti 0 set port 11 to learning
Apr 15 17:38:02 STP: msti 0 set port 11 to forwarding
Apr 15 17:38:12 monitor: restarting vtss_appl
Apr 15 17:38:12 STP: msti 0 set port 1 to discarding
Apr 15 17:38:12 STP: msti 0 set port 2 to discarding
Apr 15 17:38:12 STP: msti 0 set port 3 to discarding
Apr 15 17:38:12 STP: msti 0 set port 4 to discarding
Apr 15 17:38:13 STP: msti 0 set port 6 to discarding
Apr 15 17:38:13 STP: msti 0 set port 7 to discarding
Apr 15 17:38:13 STP: msti 0 set port 8 to discarding
Apr 15 17:38:13 STP: msti 0 set port 9 to discarding
Apr 15 17:38:13 STP: msti 0 set port 10 to discarding
Apr 15 17:38:13 STP: msti 0 set port 11 to discarding
Apr 15 17:38:14 STP: msti 0 set port 13 to discarding
Apr 15 17:38:14 STP: msti 0 set port 11 to learning
Apr 15 17:38:14 STP: msti 0 set port 11 to forwarding
Apr 15 17:38:14 Port: link state changed to 'down' on port 13
Apr 15 17:38:16 STP: msti 0 set port 10 to learning
Apr 15 17:38:16 STP: msti 0 set port 10 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 9 to learning
Apr 15 17:38:16 STP: msti 0 set port 9 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 8 to learning
Apr 15 17:38:16 STP: msti 0 set port 8 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 7 to learning
Apr 15 17:38:16 STP: msti 0 set port 7 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 6 to learning
Apr 15 17:38:16 STP: msti 0 set port 6 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 4 to learning
Apr 15 17:38:16 STP: msti 0 set port 4 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 3 to learning
Apr 15 17:38:16 STP: msti 0 set port 3 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 2 to learning
Apr 15 17:38:16 STP: msti 0 set port 2 to forwarding
Apr 15 17:38:16 STP: msti 0 set port 1 to learning
Apr 15 17:38:16 STP: msti 0 set port 1 to forwarding
Apr 15 17:38:17 Port: link state changed to 'up' (1G) on port 13
Apr 15 17:38:18 STP: msti 0 set port 13 to discarding
Apr 15 17:38:20 STP: msti 0 set port 13 to learning
Apr 15 17:38:20 STP: msti 0 set port 13 to forwarding
Apr 15 17:39:07 monitor: restarting vtss_appl
Apr 15 17:39:08 STP: msti 0 set port 1 to discarding
Apr 15 17:39:08 STP: msti 0 set port 2 to discarding
Apr 15 17:39:08 STP: msti 0 set port 3 to discarding
Apr 15 17:39:08 STP: msti 0 set port 4 to discarding
Apr 15 17:39:08 STP: msti 0 set port 6 to discarding
Apr 15 17:39:08 STP: msti 0 set port 7 to discarding
Apr 15 17:39:08 STP: msti 0 set port 8 to discarding
Apr 15 17:39:08 STP: msti 0 set port 9 to discarding
Apr 15 17:39:08 STP: msti 0 set port 10 to discarding
Apr 15 17:39:08 STP: msti 0 set port 11 to discarding
Apr 15 17:39:09 STP: msti 0 set port 13 to discarding
Apr 15 17:39:09 STP: msti 0 set port 11 to learning
Apr 15 17:39:09 STP: msti 0 set port 11 to forwarding
Apr 15 17:39:10 Port: link state changed to 'down' on port 13
Apr 15 17:39:18 monitor: restarting vtss_appl
Apr 15 17:39:19 STP: msti 0 set port 1 to discarding
Apr 15 17:39:19 STP: msti 0 set port 2 to discarding
Apr 15 17:39:19 STP: msti 0 set port 3 to discarding
Apr 15 17:39:19 STP: msti 0 set port 4 to discarding
Apr 15 17:39:19 STP: msti 0 set port 6 to discarding
Apr 15 17:39:19 STP: msti 0 set port 7 to discarding
Apr 15 17:39:19 STP: msti 0 set port 8 to discarding
Apr 15 17:39:19 STP: msti 0 set port 9 to discarding
Apr 15 17:39:20 STP: msti 0 set port 10 to discarding
Apr 15 17:39:20 STP: msti 0 set port 11 to discarding
Apr 15 17:39:20 STP: msti 0 set port 13 to discarding
Apr 15 17:39:20 STP: msti 0 set port 11 to learning
Apr 15 17:39:20 STP: msti 0 set port 11 to forwarding
Apr 15 17:39:22 Port: link state changed to 'down' on port 13
Apr 15 17:39:22 STP: msti 0 set port 10 to learning
Apr 15 17:39:22 STP: msti 0 set port 10 to forwarding
Apr 15 17:39:22 STP: msti 0 set port 9 to learning
Apr 15 17:39:22 STP: msti 0 set port 9 to forwarding
Apr 15 17:39:22 STP: msti 0 set port 8 to learning
Apr 15 17:39:22 STP: msti 0 set port 8 to forwarding
Apr 15 17:39:22 STP: msti 0 set port 7 to learning
Apr 15 17:39:22 STP: msti 0 set port 7 to forwarding
Apr 15 17:39:22 STP: msti 0 set port 6 to learning
Apr 15 17:39:23 STP: msti 0 set port 6 to forwarding
Apr 15 17:39:23 STP: msti 0 set port 4 to learning
Apr 15 17:39:23 STP: msti 0 set port 4 to forwarding
Apr 15 17:39:23 STP: msti 0 set port 3 to learning
Apr 15 17:39:23 STP: msti 0 set port 3 to forwarding
Apr 15 17:39:23 STP: msti 0 set port 2 to learning
Apr 15 17:39:23 STP: msti 0 set port 2 to forwarding
Apr 15 17:39:23 STP: msti 0 set port 1 to learning
Apr 15 17:39:23 STP: msti 0 set port 1 to forwarding
Apr 15 17:39:25 Port: link state changed to 'up' (1G) on port 13
Apr 15 17:39:25 STP: msti 0 set port 13 to discarding
Apr 15 17:39:27 STP: msti 0 set port 13 to learning
Apr 15 17:39:27 STP: msti 0 set port 13 to forwarding
Apr 15 17:56:53 dropbear[3027]: Exit before auth (user 'admin', 1 fails): Exited normally



Switch #2 stopped with the root bridge notifications as well:

Code: Select all
Apr 15 17:38:39 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 17:39:34 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 17:39:45 STP: MSTI0: New root on port 11, root path cost is 20000, root bridge id is 32768.EC-13-B2-82-E7-EC
Apr 15 19:18:07 UI: Configuration changed by admin



I'm assuming that the first switch's process for spanning tree was already marginal in receiving invalid/unknown BPDUs from the Juniper switch and when the second switch was connected and began transmitting RSTP BPDUs it just blew up and crashed repeatedly. I'm assuming that whatever this thing does controls the SFPs (maybe it's in charge of the I2C bus?) so when it dies, it dumps the SFPs, too.

I did a few searches for vtss_appl and found quite a few varied results. I'm not sure if the fixes in 1.5.2 or 1.5.5 apply to this scenario as the former, while mentioning STP, specifically calls out LAGS, and the latter isn't very specific. I see a lot of people complaining about vtss_appl crashes on 1.5.5, so, if I can mitigate this issue through a different config I don't see any compelling reason to go changing firmware.

If anyone could comment on this issue that would be great. It's truly weird, and, again, the first problem we've ever had with an astounding product.

This is my first post here so apologies if the inline logs are not convention.

Thanks!

User avatar
Stephen
Employee
Employee
 
Posts: 965
Joined: Sun Dec 24, 2017 8:56 pm
Has thanked: 77 times
Been thanked: 169 times

Re: SFP ports flap after RSTP topology update

Thu Apr 16, 2020 3:25 am

Thank you for such a detailed post if you need any more help after this is resolved, please don't break that habit. It's rare I get that much information at once and it is appreciated as it is helpful.

salad wrote:I'm assuming that whatever this thing does controls the SFPs (maybe it's in charge of the I2C bus?) so when it dies, it dumps the SFPs, too.


To summarize, you're instinct's about what is happening are pretty close.
vtss_appl is the process that is the core of a Netonix switch. I'll post a link where I go into significantly more detail about it with someone else but in essence. There are 2 process's that run on the switch that are critical. If you run the 'ps' command in the linux shell you should be able to see them.

One is 'switch' which is primarily responsible for gathering sensor data related to temperature, PoE power status, and setting change's for whether PoE is turned on, watchdogs, etc.

And 'vtss_appl' this one is quite a bit more complex but basically it manage's the switch core. The switch has a MIPS32 CPU in it that runs the OS and the switch core which is an ASIC that is responsible for all of the actual packet switching. vtss_appl reads the config setup from the website (or CLI) and tells the switch core how to behave for each configurable protocol. Incidentally, it is also how the SFP module's are detected and managed.

In some instance's it is appropriate for vtss_appl to restart depending on what you do in the webUI and click save, but there was a bug plaguing us for a long time that would cause vtss_appl to go into a crash loop. This turned out to be the result of the original code from our provider using static variables inappropriately in a highly multi-threaded environment at various area's of the code.

So basically, when vtss_appl crash's, it's the equivalent effect on your network as restarting the switch because vtss_appl has to determine what the config is suppose to be and then communicate that to the switch core, it also has to re-detect sfp module's, this cause's things like STP to run around in circle's which of course can permeate your network with packet loss.

It's possible different types of BDPU packet's streaming in could trigger the instability, but the point really is that any amount of activity that touch's on the landmine's in that firmware version will cause this same issue to re-assert itself.

You're best bet to start out with is to update. However, when you do be aware that since you are going up several versions the safest thing to do is to default the switch before the update and redo the config after it is finished. For reason's discussed here

salad
Member
 
Posts: 5
Joined: Wed Sep 26, 2018 10:00 pm
Has thanked: 3 times
Been thanked: 0 time

Re: SFP ports flap after RSTP topology update

Mon Apr 20, 2020 9:26 pm

Stephen wrote:Thank you for such a detailed post if you need any more help after this is resolved, please don't break that habit. It's rare I get that much information at once and it is appreciated as it is helpful.


You're very welcome! Thank you for fully reading!

Stephen wrote:So basically, when vtss_appl crash's, it's the equivalent effect on your network as restarting the switch because vtss_appl has to determine what the config is suppose to be and then communicate that to the switch core, it also has to re-detect sfp module's, this cause's things like STP to run around in circle's which of course can permeate your network with packet loss.

It's possible different types of BDPU packet's streaming in could trigger the instability, but the point really is that any amount of activity that touch's on the landmine's in that firmware version will cause this same issue to re-assert itself.


Alright - that makes tons of sense. Bummer. Sorry to hear about the crap code you guys got stuck with.

Stephen wrote:You're best bet to start out with is to update. However, when you do be aware that since you are going up several versions the safest thing to do is to default the switch before the update and redo the config after it is finished. For reason's discussed here


Yikes - I think if I can keep this issue from reappearing by watching out for this scenario again that will be sufficient. It sounds like any code change will be a physical replacement of the switch with one that is already configured and tested. We can't risk running into problems with top-of-tower switches.

Is there an archive of the incremental releases available?

User avatar
Stephen
Employee
Employee
 
Posts: 965
Joined: Sun Dec 24, 2017 8:56 pm
Has thanked: 77 times
Been thanked: 169 times

Re: SFP ports flap after RSTP topology update

Mon Apr 20, 2020 10:07 pm

Yes there is

Also in case you're not using it, take a look at the Netonix Manager. It will make it much easier to stay updated on firmware as it come's out.

salad
Member
 
Posts: 5
Joined: Wed Sep 26, 2018 10:00 pm
Has thanked: 3 times
Been thanked: 0 time

Re: SFP ports flap after RSTP topology update

Fri Apr 24, 2020 3:39 pm

Fantastic, thank you very much!

User avatar
Stephen
Employee
Employee
 
Posts: 965
Joined: Sun Dec 24, 2017 8:56 pm
Has thanked: 77 times
Been thanked: 169 times

Re: SFP ports flap after RSTP topology update

Wed Apr 29, 2020 2:56 am

Glad to be of help.

Return to Hardware and software issues

Who is online

Users browsing this forum: Google [Bot] and 13 guests