-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
valgrind: multiple memory leaks #170
Comments
Hi @elmeyer - Sorry for the delayed response and thanks for the detailed issue report. I've tagged this is a bug and we'll work on fixes for these. |
Hi, just wanted to add that we face memory leak with sysmon on different linux OS versions (RHEL, CentOS, Debian). The amount of memory used by sysmon is increasing all the time until the service restart. |
Thanks for the +1. We're working on some other issues at the moment, but we will make sure to circle back to this backlog item soon. |
Hi, I would also like to add us to this problem.. We experience that the memory in the system is slowly being eaten up to a point that the monitoring is alerting about high swap utilization. |
Hi - Thanks for following up on this! We are working on some other priority issues at the moment, but we haven't forgotten about this. Once we’ve made progress on our current priorities, we’ll revisit this issue and provide an update. In the meantime, feel free to share any additional details or context. Your input is much appreciated. |
Hi @MarioHewardt , just a warm follow up on this. May I have your update on the fix of this bug? any ETA on that ? Thanks! |
I have some positive updates as of this morning. Maybe it helps the others. Of course, more testing is required. Scenario: RHEL 8.8 VMs in Azure cloud. Sysmon for Linux 1.3.3. Since Friday afternoon (10 January 2025), memory usage for Sysmon has been perfectly limited through systemd service. Sysmon is behaving well and we achieved results better than we expected. In the meantime, in will be interesting to see when open-source community will resolve memory leak in new version of Sysmon (we have an open case with Microsoft which told us that the expected fix will happen around 2025 Q3).
systemctl set-property sysmon.service MemoryHigh=8G MemoryMax=10G I proposed that it would not be good to have such values on VMs with smaller amount of memory (8 GB) as it would mean that half of it would be wasted for monitoring.
systemctl set-property sysmon.service MemoryMax=1% systemctl set-property sysmon.service MemoryMax=0.1% systemctl set-property sysmon.service MemoryMax=100M
• The process should be killed if it reaches MemoryMax. • If it passes MemoryHigh then it may be suspended. The problem is that if the process reaches MemoryHigh then it may never reach MemoryMax. If the process is not actually releasing memory (memory leak) then it will just be mostly-suspended forever. • If one wants the service to die if it uses a specific amount of memory, then do not specify MemoryHigh at all, only MemoryMax.
Since Friday afternoon, Sysmon simply hovers below or at value of 100 MB for memory usage. Once it hits 100 MB threshold, it “cleans up” automatically, frees some memory and continues to run without any service restart. Here is one example: Fri Jan 10 11:25:06 AEDT 2025 One can collect such data with: while true; do date && systemctl show sysmon.service | grep -E "MemoryCurrent=|MemoryMax="; sleep 1; done
Mon Dec 16 13:04:56 AEDT 2024 ... 27.0% /opt/sysmon/sysmon There is high confidence that there is a memory leak. We already had several incidents with Sysmon taking down VMs… |
Hi all - Thanks for your patience as we've worked through this issue. The fix has been merged and it would be great if someone could verify that their scenarios work before we go ahead and publish new packages. |
Did two builds today based on the #191 commit. One on AlmaLinux8 and one on AlmaLinux9 and compiled RPMs for both systems (don't know if they actually differ - but wanted it clean). Pushed it to a machine (rhel8) that had major memory leak issues to test it over the weekend - will report results. I don't know what should be considered acceptable memory usage - but earlier this week we had that same machine go from fresh restart 200-250Mb res.ram and 0 swap to 24h later use 900+Mb res.ram and 500+Mb swap. Did a bunch of testing on this machine earlier, came to the same conclusion and solution as @vk2cot with setting |
Some results from this weekend, not very detailed - only one machine and only a few measurements. Checking the resident ram and swap. The first measurement is moments after a service restart.
Restarted the service after this and set up a log every 15 minutes to monitor the next few days. This is a quite busy machine and the one we've seen most issues with, filling up the swap fastest. |
Thanks for testing this out, appreciate it. This might be a different issue. I was able to reproduce the original issue using Valgrind and after the fix Valgrind ran clean. Is it possible for you to run with Valgrind enabled (as per above instructions)? |
Also, in addition to a valgrind run, if you could share the sysmon config you are using that'd would be helpful. |
Sadly I don't have the option to run Valgrind on that particular machine, while I tried to run it in a test environment I couldn't get Valgrind to output more than just at startup. And sorry to say but we cant really share the config either - it's rather complex and filled with sensitive data in a secure environment. Currently this is all I can provide, this is measurements over a few days and it's pretty clear that it just keeps growing and swapping out ram: Sorry that I can't give more debug logs right now, hopefully someone else can provide something more tangible. |
Thanks for getting back to us. Did you try following the valgrind instructions that the issue creator wrote down? That should enable valgrind to trace eBPF programs. Also, could you create a new issue for your specific leak? While the fix for the original issue may not fix your specific leak, we have do have a fix for the original issue that we plan on releasing here shortly. |
Closing this issue as the original memory leak has been addressed in 1.3.4. If you encounter other leaks, please open separate issues. |
Describe the bug
On multiple of our machines running Sysmon for Linux, we have noticed Sysmon for Linux occupying an ever-increasing amount of memory until the OOM killer steps in and terminates the process.
To verify that these are indeed unintended leaks, we had Sysmon for Linux run under
valgrind
over the weekend. This produced the output seen under Logs.To Reproduce
We used a build of
valgrind
at commitef95220ddae1af65c85d8d59a8f0dcbb9d7af90f
(https://sourceware.org/git/?p=valgrind.git;a=commit;h=ef95220ddae1af65c85d8d59a8f0dcbb9d7af90f), since the version shipped with Debian 11 crashes when encountering unsupported eBPF commands.Sysmon version
Sysmon for Linux 1.3.2, but built from source (at tag
1.3.2.0
) with-DCMAKE_CXX_FLAGS="-ggdb3"
to enhancevalgrind
output.sysmon.service
was modified to run undervalgrind
as follows (this must occur at compile time, since the service definition is linked into the Sysmon binary):This causes
sysmon -i
to hang after enabling the service, but Ctrl-Cing after that works and keeps the service running.Distro/kernel version
Debian 11 "Bullseye", Kernel 6.1.0-18-amd64
Sysmon configuration
Logs
Expected behavior
The main issues seem to be lack of freeing the intermediary duplicates of
tmpStringBuffer
e.g. in https://github.com/Sysinternals/SysmonCommon/blob/c1a02f4c73c81b591272ce927d8cf83f6edadf1b/eventsCommon.cpp#L2137, where the duplicate created by_tcsdup
is not freed after being duplicated again inEventDataDescCreate
(https://github.com/Sysinternals/SysmonForLinux/blob/1.3.2.0/linuxHelpers.cpp#L719). Additionally, the hash contexts created inLinuxGetFileHash
are never destroyed.The text was updated successfully, but these errors were encountered: