Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too Many Files Open #588

Closed
faragher opened this issue Oct 19, 2024 · 4 comments · Fixed by #589
Closed

Too Many Files Open #588

faragher opened this issue Oct 19, 2024 · 4 comments · Fixed by #589

Comments

@faragher
Copy link
Contributor

Describe the Bug

After ~28 days uptime, Reticulum crashes due to too many files being open, breaking ratchets and logging, and eventually crashing the entire transport instance.

To Reproduce

I don't think it wise to wait a month to see if I can duplicate, this is a fundamental issue.

Expected Behavior

File handles closed after use.

Logs & Screenshots

Typical:

Oct 18 05:52:07 reticulum.betweentheborders.com rnsd[553]:   File "/home/xxx/.local/lib/python3.11/site-packages/RNS/Identity.py", line 288, in persist_job
Oct 18 05:52:07 reticulum.betweentheborders.com rnsd[553]: OSError: [Errno 24] Too many open files: '/home/xxx/.reticulum/storage/ratchets/xxx>

System Information

  • Debian
  • Python 3.11.2
  • RNSD 0.7.7

Additional context

Working theory is that there's an error inside a try/except cycle which leaves a handle dangling.

Suggestion: Either migrate to with open('file.ext','r') as f: format or move file.close() out of try: and into finally:

@faragher
Copy link
Contributor Author

Okay, I found the issue. Identity.py line 332 opens a file which is never closed. I'm halfway through changing Identity.py over into with open so I'm going to finish that up and fix the problem. If that's unacceptable, then it's a quick fix closing the file that was opened on 332.

@liamcottle
Copy link
Contributor

and eventually crashing the entire transport instance.

Heh, I've noticed now and then when I get up in the morning, MeshChat is no longer running and had closed itself overnight. I hadn't added any logging for this yet, but this could be related. Seemed to only happen once a month or so... :D

Looks like your error log is related to linux, whereas I'm running on Windows. So no idea until I investigate further, but thought I'd throw in some extra info...

@markqvist
Copy link
Owner

Yep, that was a very classical oversight of forgetting to close the file handle again, and even in such an often-used function. Oof! Thanks for the report and fixing it. I'll release an update for this shortly! I'm betting it's the same cause for your crash @liamcottle.

@faragher
Copy link
Contributor Author

@liamcottle I would assume (a dangerous word) Windows has a similar file handle limit, which may or may not be even the same magnitude as Linux, but I'm willing to bet it would be a problem eventually. Given the similar timing, it wouldn't surprise me, but going back a few versions, sometimes things happened without any error logs I've noticed.

That being said, I can now target these crashes pretty well: I noticed it was down when the Amber Pages said everything was "two days ago." Gave me a ~24 hour period to look at in the logs.

So while this was a problem that has been resolved, I think I have an edge on future troubleshooting, as I think that system is on debug logging. I'll check when I switch over to the new build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants