Page 5 of 8

Re: v1.5.5rcX Bug Reports and Comments

Posted: Fri Oct 04, 2019 2:06 pm
by Stephen
mayheart,

Is this result on both the IDC and the AC?

I'm having a high degree of difficulty replicating the problem now with that release.

If possible, you mentioned before that you might be able to give me access to the switch - is it possible I could get access to a computer that has putty and an rs232 cable plugged into the switch with this firmware loaded?

Or if not, if you're willing, I can tell you what commands I need to run to see how vtss_appl is crashing to get a better idea of what is different between our system's.

Re: v1.5.5rcX Bug Reports and Comments

Posted: Fri Oct 04, 2019 8:49 pm
by mayheart
The switch was taken down to do the voltage level adjustment, since then the vtss_appl messages have not returned. Switch has been running for about 9 hours without any packet loss. Is it possible that this was causing the problems? or possibly having the switch powered down for a few minutes? It was only quickly powered cycle in the past. Been running 1.5.5rc3-201910040216

I'll toss the latest build you have for me onto the IDC switch, I've not tried that one since the original beta you sent me, I'll report back.

I did notice the memory usage is pretty high sitting at 116 megs out of 128, not sure if that's you running additional debugging. I'll keep it running in case you need some information off it.

Code: Select all
811 root 20 0 89524 81m 988 S 2.3 66.2 16:04.09 status_thread
743 root 20 0 89568 81m 988 S 0.0 66.2 0:00.75 erps
 744 root 20 0 89568 81m 988 R 0.0 66.2 0:00.40 mstp_thread
758 root 20 0 89580 81m 988 S 0.0 66.2 0:00.00 vtss_appl
 759 root 20 0 89580 81m 988 S 0.0 66.2 0:55.22 vtss_appl


syslog:

Code: Select all
Dec 31 19:00:06 netonix: 1.5.5rc3-201910040216 on WS-12-250-AC
Dec 31 19:00:10 system: Setting MAC address from flash configuration: EC:13:B2:64:09:0E
Dec 31 19:00:12 system: starting ntpclient
Dec 31 19:00:13 root: adding lan (eth0.997) to firewall zone lan
Dec 31 19:00:26 dropbear[764]: Running in background
Dec 31 19:00:28 switch[790]: temp sensor version 3
Dec 31 19:00:28 switch[791]: Detected cold boot
Dec 31 19:00:33 system: starting ntpclient
Oct 4 10:30:02 system: time set by NTP server

Re: v1.5.5rcX Bug Reports and Comments

Posted: Fri Oct 04, 2019 9:05 pm
by mayheart
Update on the IDC switch:

I get vtss_appl restarting over and over with the first beta build, the latest one is fine. Switch boots up without any issues.

Code: Select all
Jan 1 00:00:09 netonix: 1.5.5rc3-201910040216 on WS-26-400-IDC
Jan 1 00:00:15 system: Setting MAC address from flash configuration: EC:13:B2:11:6F:9E
Jan 1 00:00:18 root: adding lan (eth0) to firewall zone lan
Dec 31 19:00:34 root: removing lan (eth0.997) from firewall zone lan
Dec 31 19:00:39 root: adding lan (eth0.997) to firewall zone lan
Dec 31 19:00:48 root: adding lan (eth0.997) to firewall zone lan
Dec 31 19:00:50 system: starting ntpclient
Oct 4 21:03:25 dropbear[1386]: Running in background
Oct 4 21:03:28 switch[1416]: temp sensor version 3
Oct 4 21:03:28 switch[1417]: Detected warm boot
Oct 4 21:03:29 Port: link state changed to 'up' (1G) on port 25

Re: v1.5.5rcX Bug Reports and Comments

Posted: Fri Oct 04, 2019 9:12 pm
by Stephen
mayheart wrote:The switch was taken down to do the voltage level adjustment, since then the vtss_appl messages have not returned. Switch has been running for about 6 hours without any packet loss. Is it possible that this was causing the problems?


Yes actually, I've noticed in some of my testing that having an unstable power source on one of the line's (most commonly the 3.3V) can cause kernel error's to be thrown which can destabilize vtss_appl. Powering down the unit I've noticed does help stabalize it as well, though the amount time it's down is irrelevant for the software perspective (unless maybe there is something wonky with the power cap's, allowing them to fully discharge might help with certain flagging or something - it might be best to check to see if your unit is afflicted with the issue sirhc brought up earlier to eliminate this as a possibility)

mayheart wrote:I did notice the memory usage is pretty high sitting at 116 megs out of 128, not sure if that's you running additional debugging.


Actually I noticed that earlier on one of my test unit's in my switch farm and I already corrected the issue (4 hours nothing has gone past 52MB). I will send you a link to the latest one I have. It's starting to sound like our system's are nearly on the same page.

EDIT (just saw your new post on the IDC switch):

That's great! I thought I was totally off base for a bit there.
Let me know if you want me to send you the next release with the mem leak fix.

Re: v1.5.5rcX Bug Reports and Comments

Posted: Fri Oct 04, 2019 9:23 pm
by mayheart
Sure, send me the latest build, I'll toss it onto both AC and IDC switches and let it sit for the weekend.

Thanks for all your hard work getting this resolved.

Re: v1.5.5rcX Bug Reports and Comments

Posted: Fri Oct 04, 2019 9:27 pm
by Stephen
You're welcome!

I'll send you the latest shortly.

Re: v1.5.5rcX Bug Reports and Comments

Posted: Sat Oct 05, 2019 3:43 am
by Stephen
Ludvik wrote:It is not bug 1.5.5, I've noticed that before. After upgrade snmp does not return right version number.

#snmpget -v 2c -c public curve1.vinarice .1.3.6.1.4.1.46242.1.0
SNMPv2-SMI::enterprises.46242.1.0 = STRING: "Unknown"

I don't know what operations solve it. Reboot, saving configuration, trying snmpwalk (oid .1.3.6.1.4.1.46242), or only time ...

I upgrade 5 switches to 1.5.5rc2, one is OK, four is "unknown"


Hey Ludvick, I had a chance to test this behavior and I was able to replicate it and I found the cause. What's happening is that net-snmpd has a cache table that it builds for a few of the OID's that we get from the switch and it fails to load unless certain value's are polled first which seem's to trigger the cache to load for our custom OID's. For example, running:

Code: Select all
 snmpget -v 2c <switch_ip> -c public .1.3.6.1.2.1.105.1.3.1.1.4.8


For me triggered the cache to load 100% of the time for these specific cached OID's in our MIB.

I'm looking into a way to trigger the cache loading mechanism when the net-snmpd daemon is launched. It should be possible, but in the mean time that should be a useable workaround.

Btw if that doesn't work, just running snmpwalk on the community worked for me too:
Code: Select all
 snmpwalk -v 2c <switch_ip> -c public

Re: v1.5.5rcX Bug Reports and Comments

Posted: Sat Oct 05, 2019 5:59 am
by Ludvik
Probably yes, snmpwalk is helping. In my management system I run "snmpwalk SNMPv2-SMI::enterprises.46242" if is version number "unknown" and it seems working too. It's been a few days since I test it.

Re: v1.5.5rcX Bug Reports and Comments

Posted: Tue Oct 08, 2019 5:26 pm
by Stephen
mayheart,

Haven't heard from you today. But I did find in my own testing that there was one more memory leak. I fixed that one as well and will send you another one to upload. If you aren't having any issue's than I don't see a problem. But just in case it is an option. Either way it seem's like this issue has been resolved so I am working on some other things that have come up now.

Re: v1.5.5rcX Bug Reports and Comments

Posted: Tue Oct 08, 2019 5:52 pm
by mayheart
I've installed the latest image you sent me, looks good so far.

All my problems seem resolved.