MQTT/PubSubClient problem with timers & reconnect

In order to make a more robust device, I am using a timer interrupt to drive some critical tasks. I have discovered that when I do this, the connection to Ubidots is lost fairly often. I don’t think it’s actually lost. I suspect that PubSubClient does not play well when it is being interrupted - perhaps poor timing logic?

The symptoms are that Reconnect() finds the connection down and attempts to connect again. It often takes a couple of tries before it reconnects. I know you drop the connection from time to time but this is much more frequent, and what I am seeing stops if I disable the timer interrupt.

The timer interrupt runs 5 times a second and uses at most 50 ms, which is 250/1000 or 25% of the processor. If I increase this to 10 times a second the MQTT connection is “lost” quite often. This points at a timeout that is set too low, rather than an actual lost connection.

Typical reconnect times are around 2.5 seconds, but can go as high as 4.5 seconds. If it exceeds 8 seconds the watchdog timer fires and reboots the processor - which happens a few times a day.

Based on my experience so far, I am wondering how anyone can have a robust control system which uses an IOT service like this. It’s a pity there isn’t a non-blocking MQTT. I am back to thinking that I need to dedicate a processor to doing the IOT communication (Ubidots) and use a separate one for the control system.

Every so often the reconnect attempt fails with “no socket available” - once this happens there is no recovery other than rebooting the processor. This seems to be an error in the code doing the connect - it isn’t freeing up the sockets.

Will

Hi there @wbp, I have used the PubSubclient in different non-maker projects without problems but I understand that a library may not work for all the use cases. It seems that your problem is related with the interruption routine, so probably opening an issue at the developer’s site may give you additional hints about how to use them. You may also give a try to another library, like this one, pubsubclient has proven to work for me in the last years with minor issues, but there are a lot of them in the web that you can test and may be suitable for your use case.

About this, any device has a max number of available sockets, it seems that you are not closing them properly in your routine, maybe invoking the stop() method in your routine helps you to make available them again.

I would like to give you additional hints, but unfortunately this kind of hardware issues are out of my support scope.

All the best

How am I not closing the sockets properly? I am following the example code provided by Ubidots. I don’t open any sockets. I gather that PubSubClient.connect() probably does, but if so, shouldn’t it also close them???

Hi there, not it shouldn’t, the connect() method just opens a socket and connects to the broker. The examples provided at our docs just uses on of the available sockets and maintains its connection during the routine, as we just use one of the available sockets we do not need to close them. In fact, I have never experienced an error like the one that you describe with Arduino but with Particle, the way to solve it was invoking the stop() method of the TCP client and not of the PubSubclient one. I suspect that the PubSubclient library probably does not implement properly socket connection with arduino Nano, but I do not have that device for testing unfortunately.

All the best

I’m still a bit lost. I create a wifi client, then pass that to PubSubClient to create something to connect to Ubidots. In Setup() I start the WiFi and connect it to my AP. I am not repeating this process, it is done only once. I follow the example and call Reconnect() each pass thru loop() and now and then it has to reconnect. Eventually I get this error (see attached screen shot). Once this happens the reconnect attempt will no longer work, it keeps failing with “No Socket Available”.

So something is opening new sockets and eventually there aren’t any more to be had. It sounds like you are saying maybe PubSubClient is doing this, but that I should stop the TCP client - is that WiFiNINA?

Do I understand that I should be dropping the wifi connection and reestablishing it to solve this problem?

Thanks,
Will

I tried adding a line to Reconnect() to stop the WiFi Client - something that would not have occurred to me if you (jotathebest) had not suggested it, and so far I have not seen another occurrence of the “no socket available” message. It’s a bit soon to call it fixed but I am encouraged!

If there aren’t any more in the next 24 hours, I’ll post what I’m using now in the hopes that it might help someone else.

Thanks!
Will

Hi there @wbp, did you experience again socket issues? I am wondering if I can provide any additional help to throubleshoot the issue.

I will be attentive

Hello jotathebest - I have been working on this. I added a check on WiFi.status() in Reconnect() to make sure that the WiFi connection is still working. With this I discovered an error in WiFiNiNa where sometimes times out and returns an incorrect value from WiFi.status() - that’s one thing so far. I am still digging into WiFiNiNa to see if I can figure out what is causing the timeout.

Meanwhile I have been experiencing intermittent long delays on the reconnect to ubidots, sometimes so long that the watchdog timer, which is currently set to 8 seconds, fires and reboots the Arduino. This is a big problem as it interrupts the control processes which must run on time.

Will