-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pigpio startup error #106
Comments
I haven't encountered this specific error yet, but a listener is useless when the app crashes in an uncontrolled way. Couldn't we do a killall for the specific processes when doing a service restart? |
I encountered the issue when I do npm start and then terminate with ctrl+c after several cycle. Adding these listeners seemingly solv the issue but I would need to do more testing. These are catching the ctrl+c termination that is for sure (SIGINT). Probably not app crash l, but that could be solved by wrapping the the gpio controller start up call in a try catch. So any unhandled error is caught by that, logged and then in the catch block pigpio termite can be called. Note that the pigpio handler is a forked node process so it's probably terminates differently when I ctrl+C the main process. I will do some testing what signals are being sent. |
Actually there is an event signal for unhandledException: process.on('uncaughtException', err => {
console.log(`Uncaught Exception: ${err.message}`)
process.exit(1)
}) So technically there is no need a huge try catch block to handle app crash |
I currently test with a service restart, allowing the service handler to manage a couple of things. The npm start approach is usefull when the app crashes (allowing node.js to log to console), but too often a wifi hickup would kill the session and thus the app. I guess the ANT+ server will run its own process as well. So keeping track of all spawned processes and cleaning up the mess when something crashes has to happen somewhere. My instinct is the service, as systemctl seems to monitor and reboot processes sometimes as well (see https://ma.ttias.be/auto-restart-crashed-service-systemd/). |
I would not mix ANT+ things here as that is a very different kind of service. Actually it is very important that all peripherals are shutdown properly even in case of app termination or crash. In fact the signals are being used for that purpose already in my peripheral extension work in progress. (for instance if the ant stick is not shutdown properly calling the close() method it may not start on the next start up, or BLE advertisement is kept on after ORM is terminated my watch connecting to it). Anyway for the GpioTimerService.js I would imagine something like this: try {
createGpioTimerService()
} catch (e) {
log.error('Error when starting pigpio:', e)
pigpio.terminate()
process.exit(1)
}
process.on('SIGHUP', shutdown)
process.on('SIGINT', shutdown)
process.on('SIGCONT', shutdown)
process.on('SIGTERM', shutdown)
function shutdown (signal) {
console.log(signal)
pigpio.terminate()
process.exit(0)
} For some reason the uncoughtException event does not fire while I was quickly testing this so I used a try finally block (and I was not able to quickly find an answer for the reason). I understand that one can rely on the systemd process management but I am not sure that running a ORM under a systemd service when for instance initial calibration is being done and logging is important (logging as you mentioned with systemd is not really straight forward...) is feasible. So I think there is a real use case for handling pigpio termination within code we control instead of relying on solely a different service (that may be bypassed easily for convince in certain cases).
I personally would not go for an automatic service restart when the app is in an unstable state. For instance this error can only be killed if I restart my Pi, so eventually it would just go to a dead loop... |
I did more tests and I think I understand the main problem: When I send SIGINT via ctrl+c to the main node process, sometimes it fails to stop the forked process of ./app/gpio/GpioTimerService.js. So when send Alternative is that no additional process handling logic is included (i.e. the program remains to be run under systemd that actually manages shutdowns of forked processes when the main process is closed), but that would leave the app with potential bugs when run standalone via |
My point is that
I noticed some intelligence on the systemctl part as well: after five reboots, it stops rebooting a failing process automatically (see https://freshman.tech/snippets/linux/auto-restart-systemd-service/). I don't know if that is the default config for ORM (see https://github.com/laberning/openrowingmonitor/blob/v1beta/install/openrowingmonitor.service), but that is what it did when I messed up my settings.
Yes and no, it is just that the developer is in full control, which might be handy. So you have to remember to kill the gpio-thread as well. I typically kill all node threads to clean up the mess after a crash. I normally debug via two SSH sessions: one to restart the services, and one But your point that someone has to clean up the mess after a reboot is a valid one. One way or another this has to be addressed. Benefit of the systemd approach is that it will get the job done, regardless of how horrific the crash of the app is. But your approach is more elegant and could indeed also allow the ANT+ stick to be reset as well. |
The way the orm service is setup it should do restart indefenately (you can see that systemd it selfs shuts down due to too quick request, but only after trying to start 262 times :) nrowingmonitor.service: Service RestartSec=100ms expired, scheduling restart.
jan 02 21:08:50 rowingmonitor systemd[1]: openrowingmonitor.service: Scheduled restart job, restart counter is at 262.
jan 02 21:08:50 rowingmonitor systemd[1]: Stopped Open Rowing Monitor.
jan 02 21:08:50 rowingmonitor systemd[1]: openrowingmonitor.service: Start request repeated too quickly.
jan 02 21:08:50 rowingmonitor systemd[1]: openrowingmonitor.service: Failed with result 'exit-code'. There is a way to set a limit for this in the service file, systemd config is quite straight forward. I will link it here. You can rate limit the restart in a way that you tell systemd that if n time restart is done within t time, no more retry should be done. You need to add StartLimitIntervalSec and Limit burst to the Unit part of the file. [Unit]
StartLimitIntervalSec=1
StartLimitBurst=5 Basically the above tells systemd that if 5 tries fails within 1sec it will not try a 6th time.
I think with a complex systemd structure it is better to use systemd notify any way (for instance it allows proper handling of log messages with log levels). I never done that for nodejs only for dotnet core (for that it is very simple). I will look into this. |
Ouch.
I'll add this to the service definition, but with a StartLimitIntervalSec of 60 seconds.
Sounds great! |
I added this as part of the robustness improvements with Pull Request #114 , with a restart Limit Interval of 1 minute, and a burst of 5. |
Sometimes when I do frequent restart of the app I get pigpio initialize error.
˙
Based on some research this may be due to pigpio not shutting down cleanly. The recommendation is add a listener to the process exit codes where the
pigpio.terminate()
should be called:Has anyone encountered this issue?
The text was updated successfully, but these errors were encountered: