So we have multiple switches with 1.4.7 or near that with 3-4 year uptimes that we need to upgade to 1.5.11 to keep access from Chrome. Do you recommend rebooting the switch from the GUI before performing the firmware upgrade to prevent any weird bugs or memory leaks causing a failed upgrade?
I feel like it's safe to reboot first, then upgrade but what to you recommend @sirhc
v1.5.11 Bug Reports and Comments
- IntL-Daniel
- Experienced Member
- Posts: 170
- Joined: Mon Nov 02, 2015 5:07 pm
- Location: Czech Republic
- Has thanked: 7 times
- Been thanked: 9 times
Re: v1.5.11 Bug Reports and Comments
sirhc wrote:IntL-Daniel wrote:from the log of WS-10-250-AC just after the upgrade and reboot:
Jan 1 01:00:56 dropbear[1706]: Failed listening on '22': Error listening: Address already in use
Jan 1 01:00:56 dropbear[1706]: Early exit: No listening ports available.
Jan 1 01:00:56 dropbear[1713]: Running in background
Could you please provide the MAC address so I can check records. I am not sure why people are always concerned with providing the MAC as it is not a security hole except within the same subnet because you can not address a device on the hardware layer (MAC) from outside the routed subnet only from within the subnet on same flat network segment.
The WS-10-250-AC was EOL and last manufactured in October of 2017, First Manufactured in October 2015.
The WS-10-250-AC spans 3 board Revs - D, E, F
This is important as a Board Rev only increases because something changed in hardware, sometimes it is innocuous sometimes not.
Also it is important to tell what firmware version you are upgrading from.
Updated:
We went ahead and tested the WS-10-250-AC Rev D,E,F since we did not know what yours was.
We obviously updated from whatever firmware version was on the units again that info was not provided.
Results of our tests showed that v1.5.11 worked on all 3 board revisions (D,E,F) and we did not see your error.
The error message being reported is dropbear which is the service for SSH and Telenet, the 22 is the port it is attempting to listen on.
Since you can only get the boot log/scroll from the console I assume your serial port is working just fine. Question is why is dropbear complaining that port 22 is already in use?
"My" theory since I have not yet talked to one of the programmers.
Depending on how OLD your firmware was there were security issues in OLD firmware pertaining to dropbear that were patched in previous releases which could allow a hacker to upload malware/viruses to your flash which you could clear with a power on factory default and I think it would also clear with a normal software default and I assume a firmware upgrade would also clear this but not sure. Again I want to stress this is "My" theory and I am not the programmers so I could be completely wrong on this. I have put in a request for the programmers to give me their theory as to why and will pass this on to you when I hear back from them.
However I want to reiterate that we have tested WS-10-250-AC Rev D,E,F and did not see this error, and all board functions worked properly on all three board Revs.
Thanks for investigation, my switch is ec:13:b2:c1:33:9a, board F, updated from 1.5.5 to 1.5.11. Double checked the same steps now again with the same result in the log. But another reboot on 1.5.11 and the messages no more there. So only just after the upgrade.
In fact this unit is the one I had reported in other thread and it is randomly rebooting (in 1-3 days of uptime) if the firmware is higher than 1.5.5 BUT on 1.5.5 it is rock stable! Unfortunately 1.5.11 also rebooting (as mentioned in that other thread - there is a false (there is no issue with reset button) message catched via external syslog just before the reboot that it is rebooting because of reset was pressed). I do not uderstand if there is a hw issue with the switch how it can be stable on 1.5.5 but randomly reboots on higher releases :-(.
Your theory about the virus on flash sounds promising but I had already done reset with 20+ sec reset button hold on start (to reformat flash?) so there is probably no more hope?
Re: v1.5.11 Bug Reports and Comments
96 Netonixes upgraded. So far no issues. Pity the Netonix Manager is buggy as hell
2 x WS-26-500-DC
1 x WS-24-400A
2 x WS-12-250-AC
34 x WS-12-250-DC
57 x WS-8-150-DC
2 x WS-26-500-DC
1 x WS-24-400A
2 x WS-12-250-AC
34 x WS-12-250-DC
57 x WS-8-150-DC
-
sirhc - Employee
- Posts: 7398
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1604 times
- Been thanked: 1322 times
Re: v1.5.11 Bug Reports and Comments
scracha wrote:96 Netonixes upgraded. So far no issues. Pity the Netonix Manager is buggy as hell
2 x WS-26-500-DC
1 x WS-24-400A
2 x WS-12-250-AC
34 x WS-12-250-DC
57 x WS-8-150-DC
Thank you for your report on firmware.
Are you running the latest Manager version?
Also the manager performance or stability can be dependent on your version of Linux used as well as if it is updated and firewall protected from attack from outside.
Some older version of the manager will not auto update and you basically need to remove and reinstall.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sirhc - Employee
- Posts: 7398
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1604 times
- Been thanked: 1322 times
Re: v1.5.11 Bug Reports and Comments
IntL-Daniel wrote:Thanks for investigation, my switch is ec:13:b2:c1:33:9a, board F, updated from 1.5.5 to 1.5.11.
Double checked the same steps now again with the same result in the log.
But another reboot on 1.5.11 and the messages no more there.
So only just after the upgrade.
If message disappears after second reboot and all subsequent reboots I would not concern myself with it then. IMO
IntL-Daniel wrote:In fact this unit is the one I had reported in other thread and it is randomly rebooting (in 1-3 days of uptime) if the firmware is higher than 1.5.5 BUT on 1.5.5 it is rock stable! Unfortunately 1.5.11 also rebooting (as mentioned in that other thread - there is a false (there is no issue with reset button) message catched via external syslog just before the reboot that it is rebooting because of reset was pressed). I do not understand if there is a hw issue with the switch how it can be stable on 1.5.5 but randomly reboots on higher releases :-(.
Interesting. I will have to review release notes but at one time we accidently disabled the reset button in a firmware release, see where I am going with this? And also way back then there was an issue where the reset button was soldered onto the board and position that placed it too close to the front chassis so the depending on where the part got soldered keep in mind this is what is called a through hole part and the holes in the board give a +/- tolerance that caused some default buttons to come in contact or even depressed when mounted. I can not remember how we finally fixed it, either we moved the holes back on the board or we moved the mounting post back on the chassis or even both. I do know to fix the boards that failed in production we simple sent them to RMA where the teck manually moved the switch back but this would be very hard to do without the proper equipment meaning a regular soldering iron will not cut it unless you used solderwick to remove all solder from each hole one at a time, hold it back then solder each hole 1 at a time. We simply use a hot air soldering station to heat the board area up which melts all holes at same time. Depending on how close the switch is to contacting the chassy another jerry rig fix is to file the white button down slightly but you can not remove too much as the white button is hollow so you can end up with a reset button that looks like a tube and hard to press.
IntL-Daniel wrote:Your theory about the virus on flash sounds promising but I had already done reset with 20+ sec reset button hold on start (to reformat flash?) so there is probably no more hope?
Thanks, I thought so anyway but as you stated after upgrade that error only occured on first reboot so if after that it goes away I would not worry about it. The most I would do is confirm you can SSH into the switch to confirm dropbear is working properly as you never know when you NEED to SSH into a device for what ever reason such as someone forgot to update TLS.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sirhc - Employee
- Posts: 7398
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1604 times
- Been thanked: 1322 times
Re: v1.5.11 Bug Reports and Comments
dcshobby wrote:So we have multiple switches with 1.4.7 or near that with 3-4 year uptimes that we need to upgade to 1.5.11 to keep access from Chrome. Do you recommend rebooting the switch from the GUI before performing the firmware upgrade to prevent any weird bugs or memory leaks causing a failed upgrade?
I feel like it's safe to reboot first, then upgrade but what to you recommend @sirhc
Yes it is always a good idea to reboot prior to update, especially if up a LONG time or far away.
Do I always reboot my switches before I update them, no not always but I do know it is a good practice but if I mess it up I do not have to run I call an employee and make them run. Have I ever had a WS firmware update fail, not in a long time but I have seem an update fail but years ago and I have updated switches that were up way over a year as I often do not update all my switches for purpose of looking for memory leaks and to see how long they will run. The longest uptime I have seen on a WS is 3+ years before I felt I better upgrade it.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
- JockGraham
- Member
- Posts: 11
- Joined: Fri Apr 22, 2016 5:44 pm
- Has thanked: 1 time
- Been thanked: 0 time
Re: v1.5.11 Bug Reports and Comments
No worries MAC : EC:13:B2:E1:21:58
EC:13:B2:E1:74:E8
Warning: Expected 1 fan but found 0
EC:13:B2:E1:74:E8
Warning: Expected 1 fan but found 0
- IntL-Daniel
- Experienced Member
- Posts: 170
- Joined: Mon Nov 02, 2015 5:07 pm
- Location: Czech Republic
- Has thanked: 7 times
- Been thanked: 9 times
Re: v1.5.11 Bug Reports and Comments
sirhc wrote:
If message disappears after second reboot and all subsequent reboots I would not concern myself with it then. IMO
BTW this port 22 message is not related to one switch but also to ALL switches we have unboxed here...just take another new piece of WS-12-250-AC with the same result in log after the upgrade (curently form 1.5.10 to 1.5.11:
- Code: Select all
Jan 1 00:00:14 system: Setting MAC address from flash configuration: EC:13:B2:65:EB:D6
Jan 1 00:00:17 admin: adding lan (eth0) to firewall zone lan
Jan 1 00:00:18 dropbear[780]: Running in background
Jan 1 00:00:19 netonix: 1.5.11 on WS-12-250-AC
Jan 1 00:00:21 UI: web config OK
Dec 31 19:00:36 admin: removing lan (eth0) from firewall zone lan
Dec 31 19:00:39 admin: adding lan (eth0) to firewall zone lan
Dec 31 19:00:41 admin: removing lan (eth0) from firewall zone lan
Dec 31 19:00:44 admin: adding lan (eth0) to firewall zone lan
Dec 31 19:00:51 admin: adding lan (eth0) to firewall zone lan
Dec 31 19:00:52 dropbear[1661]: Failed listening on '22': Error listening: Address already in use
Dec 31 19:00:52 dropbear[1668]: Running in background
Dec 31 19:00:53 STP: msti 0 set port 12 to discarding
Dec 31 19:00:56 STP: msti 0 set port 12 to learning
Dec 31 19:00:56 STP: msti 0 set port 12 to forwarding
Dec 31 19:00:56 switch[1714]: Detected warm boot
Dec 31 19:00:59 switch[1713]: temp sensor version 1
As you said "I would not concern..." but I like better to know the reason of all unexpected behaviour...
sirhc wrote:Interesting. I will have to review release notes but at one time we accidently disabled the reset button in a firmware release, see where I am going with this? And also way back then there was an issue where the reset button was soldered onto the board and position that placed it too close to the front chassis so the depending on where the part got soldered keep in mind this is what is called a through hole part and the holes in the board give a +/- tolerance that caused some default buttons to come in contact or even depressed when mounted. I can not remember how we finally fixed it, either we moved the holes back on the board or we moved the mounting post back on the chassis or even both. I do know to fix the boards that failed in production we simple sent them to RMA where the teck manually moved the switch back but this would be very hard to do without the proper equipment meaning a regular soldering iron will not cut it unless you used solderwick to remove all solder from each hole one at a time, hold it back then solder each hole 1 at a time. We simply use a hot air soldering station to heat the board area up which melts all holes at same time. Depending on how close the switch is to contacting the chassy another jerry rig fix is to file the white button down slightly but you can not remove too much as the white button is hollow so you can end up with a reset button that looks like a tube and hard to press.
In this case, I did already a TEST with completely dismouted reset button from the mainboard but no change in behaviour...still reboots with the same reason message... but as written, 1.5.5 = ZERO reboots, anything higher REBOOTs every 1-3 days with "reset" mentioned in the log. If you have an idea to which fw changes can be sensitive, it could help me...on other hand, it is clear now that it must be related to some hw issue (no other WS-10-12-250-AC switch doing the same) BUT if there is only a hw issue, I would expect constant behaviour independent on fw!?
-
sirhc - Employee
- Posts: 7398
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1604 times
- Been thanked: 1322 times
Re: v1.5.11 Bug Reports and Comments
IntL-Daniel
I have no idea why you are seeing the dropbear error on reboot since you have stated you see it on units that are defaulted or even a new unit so the malware or virus Theory is debunked and not the issue which was a wild ass guess grasping at straws.
We have searched for your error message here at our facility and have not seen it.
Linux is a multi threaded operating system and services load and begin running simultaneously. Sometimes a service will attempt to start or run and due to timing incur an error. Could simply be that the service attempts to run and listen on port 22 and for whatever reason fails but the service will attempt to start again and obviously does load which can simply be verified by SSH into the unit. If dropbear is not running and listening on port 22 (unless you changed the port in the configuration) you could not SSH into the device.
So far we can see that v1.5.11 has been installed on thousands of units all running OK.
As I said I have requested a guess from the developers and when I talk with them I will let you know. But if the unit boots and runs and you can SSH into the unit that tells you the dropbear did load on second attempt and is running but for whatever reason failed on first attempt. Since you see it but I can not recreate it and I have no other reports of this which does not mean no one else has it they just did not notice it and it has not affected their units operation. As I said it could be something you set in the unit and I have no idea as i am not there. Could be some small minor change you make to the config that I or no one else does? If you can figure out how I can reproduce it and see it I can better explain or if needed get it fixed.
At this point with the semiconductor shortages and the need to use substitute parts which require firmware changes our developers are pretty busy. We had to drop everything to fix the TLS which we did within the time frame that we promised and we acknowledge that we should have fixed it a long time ago and was aware of it and we messed up by continually pushing it down on the things to do list as browser kept putting up deadline to dropping support on older TLS but they also kept extending it so it became a game of chicken that we obviously lost. But at this point they have returned to the task at hand which is to finish code to allow us to resume manufacturing which has been stopped now for months. Obviously if we can not sell units we can make no money and continue to exist. This worldwide shortage of semiconductors has already caused many small companies like us to close up and more will continue to close up. Worldwide political and military situation have the entire world hanging on a cliff. If Russia goes into Ukraine energy prices around the world will surge further driving inflation. If China goes into Taiwan then possibly 60% of the worlds semiconductor supply goes off line, one can only imagine what that landscape will look like but it is not good.
As far as your WS-10-250-AC running on v1.5.5 but rebooting on v1.5.11 yett v1.5.11 runs on all of your other switches just fine would indicate something is wrong with the hardware on that unit. Possible flash but more likely memory but who knows. The unit is 5-7 years old. Each version of firmware generally gets a little bigger and uses just a little bit more of memory. Or this unit is configured slightly different than other units and some service or firmware configuration causes the issue.
If this was my unit I would remove from service and bring back to my shop.
I would then factory default it and let it run with no configuration changes from default and see if reboot still occurs.
I would then enable as many services and configurations as possible to the inservice configuration but only 1 at a time and let it run to see if it reboots between each change or small group of changes.
Now if the unit you put in place of this unit behaves the same as the one removed and starts to reboot then you know it is something unique to that environment or network segment that affects the unit differently between v1.5.5 and v1.5.11. Could even be that some exterior actor has located that device and is targeting that device. In the event the unit is being targeted you could configure the firewall / access control list on the switch to prevent outside attacks, change the IP, or take other actions to limit access to the unit
IF you can help narrow down or come to the conclusion that it is HW such as bad memory and if too much memory is addressed it reboots.
But at this time I have no way to reproduce the error but obviously we see dropbear fail to load but then seems to load and operate normally.
I have no idea why you are seeing the dropbear error on reboot since you have stated you see it on units that are defaulted or even a new unit so the malware or virus Theory is debunked and not the issue which was a wild ass guess grasping at straws.
We have searched for your error message here at our facility and have not seen it.
Linux is a multi threaded operating system and services load and begin running simultaneously. Sometimes a service will attempt to start or run and due to timing incur an error. Could simply be that the service attempts to run and listen on port 22 and for whatever reason fails but the service will attempt to start again and obviously does load which can simply be verified by SSH into the unit. If dropbear is not running and listening on port 22 (unless you changed the port in the configuration) you could not SSH into the device.
So far we can see that v1.5.11 has been installed on thousands of units all running OK.
As I said I have requested a guess from the developers and when I talk with them I will let you know. But if the unit boots and runs and you can SSH into the unit that tells you the dropbear did load on second attempt and is running but for whatever reason failed on first attempt. Since you see it but I can not recreate it and I have no other reports of this which does not mean no one else has it they just did not notice it and it has not affected their units operation. As I said it could be something you set in the unit and I have no idea as i am not there. Could be some small minor change you make to the config that I or no one else does? If you can figure out how I can reproduce it and see it I can better explain or if needed get it fixed.
At this point with the semiconductor shortages and the need to use substitute parts which require firmware changes our developers are pretty busy. We had to drop everything to fix the TLS which we did within the time frame that we promised and we acknowledge that we should have fixed it a long time ago and was aware of it and we messed up by continually pushing it down on the things to do list as browser kept putting up deadline to dropping support on older TLS but they also kept extending it so it became a game of chicken that we obviously lost. But at this point they have returned to the task at hand which is to finish code to allow us to resume manufacturing which has been stopped now for months. Obviously if we can not sell units we can make no money and continue to exist. This worldwide shortage of semiconductors has already caused many small companies like us to close up and more will continue to close up. Worldwide political and military situation have the entire world hanging on a cliff. If Russia goes into Ukraine energy prices around the world will surge further driving inflation. If China goes into Taiwan then possibly 60% of the worlds semiconductor supply goes off line, one can only imagine what that landscape will look like but it is not good.
As far as your WS-10-250-AC running on v1.5.5 but rebooting on v1.5.11 yett v1.5.11 runs on all of your other switches just fine would indicate something is wrong with the hardware on that unit. Possible flash but more likely memory but who knows. The unit is 5-7 years old. Each version of firmware generally gets a little bigger and uses just a little bit more of memory. Or this unit is configured slightly different than other units and some service or firmware configuration causes the issue.
If this was my unit I would remove from service and bring back to my shop.
I would then factory default it and let it run with no configuration changes from default and see if reboot still occurs.
I would then enable as many services and configurations as possible to the inservice configuration but only 1 at a time and let it run to see if it reboots between each change or small group of changes.
Now if the unit you put in place of this unit behaves the same as the one removed and starts to reboot then you know it is something unique to that environment or network segment that affects the unit differently between v1.5.5 and v1.5.11. Could even be that some exterior actor has located that device and is targeting that device. In the event the unit is being targeted you could configure the firewall / access control list on the switch to prevent outside attacks, change the IP, or take other actions to limit access to the unit
IF you can help narrow down or come to the conclusion that it is HW such as bad memory and if too much memory is addressed it reboots.
But at this time I have no way to reproduce the error but obviously we see dropbear fail to load but then seems to load and operate normally.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
-
sirhc - Employee
- Posts: 7398
- Joined: Tue Apr 08, 2014 3:48 pm
- Location: Lancaster, PA
- Has thanked: 1604 times
- Been thanked: 1322 times
Re: v1.5.11 Bug Reports and Comments
JockGraham wrote:No worries MAC : EC:13:B2:E1:21:58
EC:13:B2:E1:74:E8
Warning: Expected 1 fan but found 0
MAC : EC:13:B2:E1:21:58
Model WS-8-150-DC
Mfg Date 10/09/2016 <= THIS UNIT DEFINITELY HAS THE OLDER MODEL FAN
Date Ordered 12/09/2016
Order # 145002571
Date Shipped 12/14/2016
Shipment # 100002563
Sold To Streakwave Wireless
MAC : EC:13:B2:E1:74:E8
Model WS-8-150-DC
Mfg Date 12/19/2017 <= THIS UNIT MOST LIKELY HAS THE OLDER MODEL FAN
Date Ordered 12/04/2017
Order #145004471
Date Shipped 12/19/2017
Shipment # 100004485
Sold To Streakwave Wireless
Both these fans are 4-5 years old and depending on the environment (dusty or warm) could easily be failing. I have to replace the fans in my Cisco routers about every 5 years at each of my tower sites. Computer fans also like to fail around 5+/- years because electronics act like ionizers and tend to attract dust which is why even in clean offices we need to clean and service computers every year and they are packed with dust.
Truth be told the original model fans were better fans. We did not change fans because we wanted to we had no choice as the original 4 armature fan was discontinued and replaced with a 2 armature. We even attempted to switch manufacturers as we did not like the replacement fan and the other manufacturer fan was about the same as the original manufacturer replacement fan. We standardized on the second manufacture only because their lead time was much shorter. And it had nothing to do with money as they were both the exact same price.
But due to the age of the unit they probably have the original fan and the way we addressed the fans in newer firmware had to change to be able to detect and support both fans. So it is possible the fan is dirty and or failing but will start with the older firmware as the way it deals with starting the fan had to change. To achieve the multiple speeds of the fans we do what is called pulse width modulation which is turning ON and OFF the voltage at a specific frequency. Each fan types require a different frequency and the better older fan can start up more easily even if dirty or failing as it is 4 armature verses 2. So to ramp up the newer fan we have to apply FULL power no modulation then slow it down. The older fan would startup with no full power ramp up.
But anyway your going to need to either go on site and clean the fan and or replace the fan. If you replace an older fan with a newer fan or vise versa you need to go to the Device/Status Tab and press the button in the fan section to force the fan to reevaluate which fan it has. When cleaning a fan with compress air never allow the fan to spin freely when using compressed air as it will spin the fan to fast and possibly damage it. Always hold the fan while applying compressed air to prevent over spinning it. Personally in my 30+ years of doing IT work I learned this lesson and often found that after cleaning a fan I allowed to over spin it failed within a short period of time and those that I prevented from over spinning did not fail quickly.
Support is handled on the Forums not in Emails and PMs.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Before you ask a question use the Search function to see it has been answered before.
To do an Advanced Search click the magnifying glass in the Search Box.
To upload pictures click the Upload attachment link below the BLUE SUBMIT BUTTON.
Who is online
Users browsing this forum: No registered users and 48 guests