Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valgrind: multiple memory leaks #170

Closed
elmeyer opened this issue Apr 8, 2024 · 15 comments
Closed

valgrind: multiple memory leaks #170

elmeyer opened this issue Apr 8, 2024 · 15 comments
Assignees
Labels
bug Something isn't working

Comments

@elmeyer
Copy link
Contributor

elmeyer commented Apr 8, 2024

Describe the bug
On multiple of our machines running Sysmon for Linux, we have noticed Sysmon for Linux occupying an ever-increasing amount of memory until the OOM killer steps in and terminates the process.

To verify that these are indeed unintended leaks, we had Sysmon for Linux run under valgrind over the weekend. This produced the output seen under Logs.

To Reproduce
We used a build of valgrind at commit ef95220ddae1af65c85d8d59a8f0dcbb9d7af90f (https://sourceware.org/git/?p=valgrind.git;a=commit;h=ef95220ddae1af65c85d8d59a8f0dcbb9d7af90f), since the version shipped with Debian 11 crashes when encountering unsupported eBPF commands.

Sysmon version
Sysmon for Linux 1.3.2, but built from source (at tag 1.3.2.0) with -DCMAKE_CXX_FLAGS="-ggdb3" to enhance valgrind output.

sysmon.service was modified to run under valgrind as follows (this must occur at compile time, since the service definition is linked into the Sysmon binary):

#
#    Copyright (c) Microsoft Corporation
#
#    All rights reserved.
#
#    MIT License
#
#    Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the ""Software""), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
#
#    The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
#    THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
#

[Unit]
Description=Sysmon event logger
After=syslog.service

[Service]
Type=forking
User=root
WorkingDirectory=/opt/sysmon
ExecStart=/root/valgrind-ef95220ddae1af65c85d8d59a8f0dcbb9d7af90f/vg-in-place --trace-children=yes --smc-check=all --tool=memcheck --leak-check=yes --log-file=/var/log/valgrind/sysmon_%%p.log /opt/sysmon/sysmon -i /opt/sysmon/config.xml -service

[Install]
WantedBy=multi-user.target

This causes sysmon -i to hang after enabling the service, but Ctrl-Cing after that works and keeps the service running.

Distro/kernel version
Debian 11 "Bullseye", Kernel 6.1.0-18-amd64

Sysmon configuration

<Sysmon schemaversion="4.90">
    <HashAlgorithms>md5,sha1,sha256</HashAlgorithms>
    <EventFiltering>
        <RuleGroup name="ngs-pulsar ProcessTerminate include none" groupRelation="or">
            <ProcessTerminate onmatch="include"></ProcessTerminate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1021.004" groupRelation="and">
            <ProcessCreate onmatch="include">
                <Image condition="end with">ssh</Image>
                <CommandLine condition="contains">ConnectTimeout=</CommandLine>
                <CommandLine condition="contains">BatchMode=yes</CommandLine>
                <CommandLine condition="contains">StrictHostKeyChecking=no</CommandLine>
                <CommandLine condition="contains any">wget;curl</CommandLine>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1027.001" groupRelation="and">
            <ProcessCreate onmatch="include">
                <Image condition="is">/bin/dd</Image>
                <CommandLine condition="contains all">dd;if=</CommandLine>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1033" groupRelation="or">
            <ProcessCreate onmatch="include">
                <CommandLine condition="contains">/var/run/utmp</CommandLine>
                <CommandLine condition="contains">/var/log/btmp</CommandLine>
                <CommandLine condition="contains">/var/log/wtmp</CommandLine>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1053.003" groupRelation="or">
            <ProcessCreate onmatch="include">
                <Image condition="end with">crontab</Image>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1059.004" groupRelation="or">
            <ProcessCreate onmatch="include">
                <Image condition="end with">/bin/bash</Image>
                <Image condition="end with">/bin/dash</Image>
                <Image condition="end with">/bin/sh</Image>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1070.006" groupRelation="and">
            <ProcessCreate onmatch="include">
                <Image condition="is">/bin/touch</Image>
                <CommandLine condition="contains any">-r;--reference;-t;--time</CommandLine>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1087.001" groupRelation="or">
            <ProcessCreate onmatch="include">
                <CommandLine condition="contains">/etc/passwd</CommandLine>
                <CommandLine condition="contains">/etc/sudoers</CommandLine>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1105" groupRelation="or">
            <ProcessCreate onmatch="include">
                <Image condition="end with">wget</Image>
                <Image condition="end with">curl</Image>
                <Image condition="end with">ftpget</Image>
                <Image condition="end with">tftp</Image>
                <Image condition="end with">lwp-download</Image>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1123" groupRelation="and">
            <ProcessCreate onmatch="include">
                <Image condition="contains">/bin/aplay</Image>
                <CommandLine condition="contains">arecord</CommandLine>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1136.001" groupRelation="or">
            <ProcessCreate onmatch="include">
                <Image condition="end with">useradd</Image>
                <Image condition="end with">adduser</Image>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1485" groupRelation="and">
            <ProcessCreate onmatch="include">
                <Image condition="is">/bin/dd</Image>
                <CommandLine condition="contains all">dd;of=;if=</CommandLine>
                <CommandLine condition="contains any">if=/dev/zero;if=/dev/null</CommandLine>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1505.003" groupRelation="and">
            <ProcessCreate onmatch="include">
                <Image condition="contains any">whoami;ifconfig;/usr/bin/ip;/bin/uname</Image>
                <ParentImage condition="contains any">httpd;lighttpd;nginx;apache2;node;dash</ParentImage>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1543.002" groupRelation="or">
            <ProcessCreate onmatch="include">
                <Image condition="end with">systemd</Image>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1548.001" groupRelation="or">
            <ProcessCreate onmatch="include">
                <Image condition="end with">chmod</Image>
                <Image condition="end with">chown</Image>
                <Image condition="end with">fchmod</Image>
                <Image condition="end with">fchmodat</Image>
                <Image condition="end with">fchown</Image>
                <Image condition="end with">fchownat</Image>
                <Image condition="end with">fremovexattr</Image>
                <Image condition="end with">fsetxattr</Image>
                <Image condition="end with">lchown</Image>
                <Image condition="end with">lremovexattr</Image>
                <Image condition="end with">lsetxattr</Image>
                <Image condition="end with">removexattr</Image>
                <Image condition="end with">setuid</Image>
                <Image condition="end with">setgid</Image>
                <Image condition="end with">setreuid</Image>
                <Image condition="end with">setregid</Image>
            </ProcessCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1037" groupRelation="or">
            <FileCreate onmatch="include">
                <TargetFilename condition="begin with">/etc/init/</TargetFilename>
                <TargetFilename condition="begin with">/etc/init.d/</TargetFilename>
                <TargetFilename condition="begin with">/etc/rc.d/</TargetFilename>
            </FileCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1053.003" groupRelation="or">
            <FileCreate onmatch="include">
                <TargetFilename condition="is">/etc/cron.allow</TargetFilename>
                <TargetFilename condition="is">/etc/cron.deny</TargetFilename>
                <TargetFilename condition="is">/etc/crontab</TargetFilename>
                <TargetFilename condition="begin with">/etc/cron.d/</TargetFilename>
                <TargetFilename condition="begin with">/etc/cron.daily/</TargetFilename>
                <TargetFilename condition="begin with">/etc/cron.hourly/</TargetFilename>
                <TargetFilename condition="begin with">/etc/cron.monthly/</TargetFilename>
                <TargetFilename condition="begin with">/etc/cron.weekly/</TargetFilename>
                <TargetFilename condition="begin with">/var/spool/cron/crontabs/</TargetFilename>
            </FileCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1105" groupRelation="or">
            <FileCreate onmatch="include">
                <Image condition="end with">wget</Image>
                <Image condition="end with">curl</Image>
                <Image condition="end with">ftpget</Image>
                <Image condition="end with">tftp</Image>
                <Image condition="end with">lwp-download</Image>
            </FileCreate>
        </RuleGroup>
        <RuleGroup name="MitreId=T1543.002" groupRelation="or">
            <FileCreate onmatch="include">
                <TargetFilename condition="begin with">/etc/systemd/system</TargetFilename>
                <TargetFilename condition="begin with">/usr/lib/systemd/system</TargetFilename>
                <TargetFilename condition="begin with">/run/systemd/system/</TargetFilename>
                <TargetFilename condition="contains">/systemd/user/</TargetFilename>
            </FileCreate>
        </RuleGroup>
    </EventFiltering>
</Sysmon>

Logs

==3891603== Memcheck, a memory error detector
==3891603== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3891603== Using Valgrind-3.18.0.GIT and LibVEX; rerun with -h for copyright info
==3891603== Command: /opt/sysmon/sysmon -i /opt/sysmon/config.xml -service
==3891603== Parent PID: 3890757
==3891603== 
==3891603== Syscall param bpf(attr->expected_attach_type) points to uninitialised byte(s)
==3891603==    at 0x4E8C719: syscall (syscall.S:38)
==3891603==    by 0x48713E6: sys_bpf (bpf.c:75)
==3891603==    by 0x48713E6: sys_bpf_fd (bpf.c:83)
==3891603==    by 0x48713E6: sys_bpf_prog_load (bpf.c:92)
==3891603==    by 0x48764A0: probe_kern_prog_name (libbpf.c:4527)
==3891603==    by 0x4879B7E: kernel_supports (libbpf.c:4910)
==3891603==    by 0x4879B7E: kernel_supports (libbpf.c:4898)
==3891603==    by 0x487BE1C: bpf_object__create_map (libbpf.c:5034)
==3891603==    by 0x4886F25: bpf_object__create_maps (libbpf.c:5296)
==3891603==    by 0x4888202: bpf_object_load (libbpf.c:7738)
==3891603==    by 0x4888202: bpf_object__load (libbpf.c:7787)
==3891603==    by 0x486AB15: ebpfStart (telemetryLoader.c:1319)
==3891603==    by 0x486AB15: ebpfStart (telemetryLoader.c:1256)
==3891603==    by 0x486B2AC: telemetryStart (telemetryLoader.c:1510)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603==  Address 0x1ffeffe9c4 is on thread 1's stack
==3891603==  in frame #2, created by probe_kern_prog_name (libbpf.c:4510)
==3891603== 
==3891603== Syscall param bpf(attr->prog_ifindex) points to uninitialised byte(s)
==3891603==    at 0x4E8C719: syscall (syscall.S:38)
==3891603==    by 0x48713E6: sys_bpf (bpf.c:75)
==3891603==    by 0x48713E6: sys_bpf_fd (bpf.c:83)
==3891603==    by 0x48713E6: sys_bpf_prog_load (bpf.c:92)
==3891603==    by 0x48764A0: probe_kern_prog_name (libbpf.c:4527)
==3891603==    by 0x4879B7E: kernel_supports (libbpf.c:4910)
==3891603==    by 0x4879B7E: kernel_supports (libbpf.c:4898)
==3891603==    by 0x487BE1C: bpf_object__create_map (libbpf.c:5034)
==3891603==    by 0x4886F25: bpf_object__create_maps (libbpf.c:5296)
==3891603==    by 0x4888202: bpf_object_load (libbpf.c:7738)
==3891603==    by 0x4888202: bpf_object__load (libbpf.c:7787)
==3891603==    by 0x486AB15: ebpfStart (telemetryLoader.c:1319)
==3891603==    by 0x486AB15: ebpfStart (telemetryLoader.c:1256)
==3891603==    by 0x486B2AC: telemetryStart (telemetryLoader.c:1510)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603==  Address 0x1ffeffe9c0 is on thread 1's stack
==3891603==  in frame #2, created by probe_kern_prog_name (libbpf.c:4510)
==3891603== 
==3891603== Syscall param bpf(attr->value) points to uninitialised byte(s)
==3891603==    at 0x4E8C719: syscall (syscall.S:38)
==3891603==    by 0x4872241: sys_bpf (bpf.c:75)
==3891603==    by 0x4872241: bpf_map_update_elem (bpf.c:394)
==3891603==    by 0x486ABE6: ebpfStart (telemetryLoader.c:1358)
==3891603==    by 0x486ABE6: ebpfStart (telemetryLoader.c:1256)
==3891603==    by 0x486B2AC: telemetryStart (telemetryLoader.c:1510)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603==  Address 0x1ffeffee94 is on thread 1's stack
==3891603==  in frame #2, created by ebpfStart (telemetryLoader.c:1267)
==3891603== 
==3891603== Syscall param bpf(attr->value) points to uninitialised byte(s)
==3891603==    at 0x4E8C719: syscall (syscall.S:38)
==3891603==    by 0x4872241: sys_bpf (bpf.c:75)
==3891603==    by 0x4872241: bpf_map_update_elem (bpf.c:394)
==3891603==    by 0x486AC15: ebpfStart (telemetryLoader.c:1365)
==3891603==    by 0x486AC15: ebpfStart (telemetryLoader.c:1256)
==3891603==    by 0x486B2AC: telemetryStart (telemetryLoader.c:1510)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603==  Address 0x1ffeffee94 is on thread 1's stack
==3891603==  in frame #2, created by ebpfStart (telemetryLoader.c:1267)
==3891603== 
--3891603-- WARNING: unhandled eBPF command 28
--3891603-- WARNING: unhandled eBPF command 28
--3891603-- WARNING: unhandled eBPF command 28
==3891603== 
==3891603== HEAP SUMMARY:
==3891603==     in use at exit: 700,472,339 bytes in 17,382,631 blocks
==3891603==   total heap usage: 545,916,492 allocs, 528,533,861 frees, 121,240,204,927 bytes allocated
==3891603== 
==3891603== 4 bytes in 1 blocks are definitely lost in loss record 4 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x18139D: HashingValidation (parsecommandline.c:324)
==3891603==    by 0x181D57: ParseCommandLine (parsecommandline.c:952)
==3891603==    by 0x172B46: main (sysmonforlinux.c:1226)
==3891603== 
==3891603== 16 bytes in 1 blocks are definitely lost in loss record 101 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x4E298E9: strdup (strdup.c:42)
==3891603==    by 0x1844B7: ApplyConfigurationFile (xml.cpp:1485)
==3891603==    by 0x181BC6: ParseCommandLine (parsecommandline.c:868)
==3891603==    by 0x172B46: main (sysmonforlinux.c:1226)
==3891603== 
==3891603== 16 bytes in 1 blocks are definitely lost in loss record 102 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x4E298E9: strdup (strdup.c:42)
==3891603==    by 0x1844B7: ApplyConfigurationFile (xml.cpp:1485)
==3891603==    by 0x181BC6: ParseCommandLine (parsecommandline.c:868)
==3891603==    by 0x17565E: setConfigFromStoredArgv (sysmonforlinux.c:1009)
==3891603==    by 0x172E3E: main (sysmonforlinux.c:1429)
==3891603== 
==3891603== 4,160 bytes in 121 blocks are possibly lost in loss record 270 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x4E298E9: strdup (strdup.c:42)
==3891603==    by 0x18B9AC: EventResolveField (eventsCommon.cpp:2137)
==3891603==    by 0x18C469: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2441)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 4,536 bytes in 63 blocks are possibly lost in loss record 271 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x1AB34D: CRYPTO_zalloc (in /opt/sysmon/sysmon)
==3891603==    by 0x178095: LinuxGetFileHash (linuxHelpers.cpp:902)
==3891603==    by 0x18BAB1: EventResolveField (eventsCommon.cpp:1739)
==3891603==    by 0x18C469: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2441)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 4,896 bytes in 68 blocks are possibly lost in loss record 272 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x1AB34D: CRYPTO_zalloc (in /opt/sysmon/sysmon)
==3891603==    by 0x178085: LinuxGetFileHash (linuxHelpers.cpp:900)
==3891603==    by 0x18BAB1: EventResolveField (eventsCommon.cpp:1739)
==3891603==    by 0x18C469: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2441)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 5,184 bytes in 72 blocks are possibly lost in loss record 273 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x1AB34D: CRYPTO_zalloc (in /opt/sysmon/sysmon)
==3891603==    by 0x17808D: LinuxGetFileHash (linuxHelpers.cpp:901)
==3891603==    by 0x18BAB1: EventResolveField (eventsCommon.cpp:1739)
==3891603==    by 0x18C469: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2441)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 20,726 bytes in 68 blocks are possibly lost in loss record 278 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x18A3E4: ProcessCache::ProcessAdd(GUID, SYSMON_EVENT_HEADER*) (eventsCommon.cpp:366)
==3891603==    by 0x18AA71: GenerateUniquePGUID(GUID*, SYSMON_EVENT_HEADER*, bool) (eventsCommon.cpp:499)
==3891603==    by 0x18C1DC: EventResolveField (eventsCommon.cpp:1888)
==3891603==    by 0x18CA23: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2363)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 36,816 bytes in 944 blocks are possibly lost in loss record 279 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x4E298E9: strdup (strdup.c:42)
==3891603==    by 0x18B9AC: EventResolveField (eventsCommon.cpp:2137)
==3891603==    by 0x18CA44: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2371)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 37,011 bytes in 949 blocks are possibly lost in loss record 280 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x4E298E9: strdup (strdup.c:42)
==3891603==    by 0x18B9AC: EventResolveField (eventsCommon.cpp:2137)
==3891603==    by 0x18CA65: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2379)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 38,688 bytes in 992 blocks are possibly lost in loss record 281 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x4E298E9: strdup (strdup.c:42)
==3891603==    by 0x18B9AC: EventResolveField (eventsCommon.cpp:2137)
==3891603==    by 0x18CA23: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2363)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 844,498 bytes in 3,453 blocks are definitely lost in loss record 283 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x18A3E4: ProcessCache::ProcessAdd(GUID, SYSMON_EVENT_HEADER*) (eventsCommon.cpp:366)
==3891603==    by 0x18AA71: GenerateUniquePGUID(GUID*, SYSMON_EVENT_HEADER*, bool) (eventsCommon.cpp:499)
==3891603==    by 0x18C1DC: EventResolveField (eventsCommon.cpp:1888)
==3891603==    by 0x18C469: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2441)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 12,934,204 (12,933,720 direct, 484 indirect) bytes in 179,635 blocks are definitely lost in loss record 284 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x1AB34D: CRYPTO_zalloc (in /opt/sysmon/sysmon)
==3891603==    by 0x17808D: LinuxGetFileHash (linuxHelpers.cpp:901)
==3891603==    by 0x18BAB1: EventResolveField (eventsCommon.cpp:1739)
==3891603==    by 0x18C469: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2441)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 12,935,041 (12,934,512 direct, 529 indirect) bytes in 179,646 blocks are definitely lost in loss record 285 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x1AB34D: CRYPTO_zalloc (in /opt/sysmon/sysmon)
==3891603==    by 0x178095: LinuxGetFileHash (linuxHelpers.cpp:902)
==3891603==    by 0x18BAB1: EventResolveField (eventsCommon.cpp:1739)
==3891603==    by 0x18C469: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2441)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 12,935,161 (12,934,152 direct, 1,009 indirect) bytes in 179,641 blocks are definitely lost in loss record 286 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x1AB34D: CRYPTO_zalloc (in /opt/sysmon/sysmon)
==3891603==    by 0x178085: LinuxGetFileHash (linuxHelpers.cpp:900)
==3891603==    by 0x18BAB1: EventResolveField (eventsCommon.cpp:1739)
==3891603==    by 0x18C469: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2441)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 26,166,422 bytes in 1,257,826 blocks are definitely lost in loss record 287 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x4E298E9: strdup (strdup.c:42)
==3891603==    by 0x18B9AC: EventResolveField (eventsCommon.cpp:2137)
==3891603==    by 0x18C469: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2441)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 31,422,094 bytes in 117,275 blocks are definitely lost in loss record 288 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x18A3E4: ProcessCache::ProcessAdd(GUID, SYSMON_EVENT_HEADER*) (eventsCommon.cpp:366)
==3891603==    by 0x18AA71: GenerateUniquePGUID(GUID*, SYSMON_EVENT_HEADER*, bool) (eventsCommon.cpp:499)
==3891603==    by 0x18C1DC: EventResolveField (eventsCommon.cpp:1888)
==3891603==    by 0x18CA23: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2363)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 200,989,035 bytes in 5,153,565 blocks are definitely lost in loss record 289 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x4E298E9: strdup (strdup.c:42)
==3891603==    by 0x18B9AC: EventResolveField (eventsCommon.cpp:2137)
==3891603==    by 0x18CA23: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2363)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 200,991,063 bytes in 5,153,617 blocks are definitely lost in loss record 290 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x4E298E9: strdup (strdup.c:42)
==3891603==    by 0x18B9AC: EventResolveField (eventsCommon.cpp:2137)
==3891603==    by 0x18CA44: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2371)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== 200,991,063 bytes in 5,153,617 blocks are definitely lost in loss record 291 of 291
==3891603==    at 0x484079B: malloc (vg_replace_malloc.c:381)
==3891603==    by 0x4E298E9: strdup (strdup.c:42)
==3891603==    by 0x18B9AC: EventResolveField (eventsCommon.cpp:2137)
==3891603==    by 0x18CA65: EventProcess(SYSMON_EVENT_TYPE_FMT*, SYSMON_DATA_DESCRIPTOR*, SYSMON_EVENT_HEADER*, unsigned long*) (eventsCommon.cpp:2379)
==3891603==    by 0x18D5C3: DispatchEvent (eventsCommon.cpp:2900)
==3891603==    by 0x174D3F: processProcessCreate (sysmonforlinux.c:616)
==3891603==    by 0x48786B0: perf_buffer__process_record (libbpf.c:11925)
==3891603==    by 0x4878803: perf_event_read_simple.constprop.0 (libbpf.c:11553)
==3891603==    by 0x4886631: perf_buffer__process_records (libbpf.c:11947)
==3891603==    by 0x4886631: perf_buffer__poll (libbpf.c:11972)
==3891603==    by 0x486B359: telemetryStart (telemetryLoader.c:1526)
==3891603==    by 0x173768: main (sysmonforlinux.c:1668)
==3891603== 
==3891603== LEAK SUMMARY:
==3891603==    definitely lost: 700,206,595 bytes in 17,378,278 blocks
==3891603==    indirectly lost: 2,022 bytes in 24 blocks
==3891603==      possibly lost: 152,017 bytes in 3,277 blocks
==3891603==    still reachable: 109,689 bytes in 1,031 blocks
==3891603==         suppressed: 0 bytes in 0 blocks
==3891603== Reachable blocks (those to which a pointer was found) are not shown.
==3891603== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==3891603== 
==3891603== Use --track-origins=yes to see where uninitialised values come from
==3891603== For lists of detected and suppressed errors, rerun with: -s
==3891603== ERROR SUMMARY: 26 errors from 25 contexts (suppressed: 0 from 0)

Expected behavior
The main issues seem to be lack of freeing the intermediary duplicates of tmpStringBuffer e.g. in https://github.com/Sysinternals/SysmonCommon/blob/c1a02f4c73c81b591272ce927d8cf83f6edadf1b/eventsCommon.cpp#L2137, where the duplicate created by _tcsdup is not freed after being duplicated again in EventDataDescCreate (https://github.com/Sysinternals/SysmonForLinux/blob/1.3.2.0/linuxHelpers.cpp#L719). Additionally, the hash contexts created in LinuxGetFileHash are never destroyed.

@MarioHewardt MarioHewardt added the bug Something isn't working label May 14, 2024
@MarioHewardt
Copy link
Collaborator

MarioHewardt commented May 14, 2024

Hi @elmeyer - Sorry for the delayed response and thanks for the detailed issue report. I've tagged this is a bug and we'll work on fixes for these.

@1ihor1
Copy link

1ihor1 commented Aug 20, 2024

Hi, just wanted to add that we face memory leak with sysmon on different linux OS versions (RHEL, CentOS, Debian). The amount of memory used by sysmon is increasing all the time until the service restart.

@MarioHewardt
Copy link
Collaborator

Thanks for the +1. We're working on some other issues at the moment, but we will make sure to circle back to this backlog item soon.

@Niklas-PDA
Copy link

Hi, I would also like to add us to this problem..
We are running Sysmon version 1.3.3 on Red Hat Enterprise Linux 8.10 (Ootpa)

We experience that the memory in the system is slowly being eaten up to a point that the monitoring is alerting about high swap utilization.
Sysmon consumes 7 of 8Gb swap and 1 of 3Gb RAM.
A restart of the sysmon service solves the problem in the short term, after the restart sysmon is slowly consuming more and more RAM.

@MarioHewardt
Copy link
Collaborator

Hi - Thanks for following up on this! We are working on some other priority issues at the moment, but we haven't forgotten about this. Once we’ve made progress on our current priorities, we’ll revisit this issue and provide an update. In the meantime, feel free to share any additional details or context. Your input is much appreciated.

@lirick2022
Copy link

Hi @MarioHewardt , just a warm follow up on this. May I have your update on the fix of this bug? any ETA on that ? Thanks!

@vk2cot
Copy link

vk2cot commented Jan 12, 2025

I have some positive updates as of this morning. Maybe it helps the others. Of course, more testing is required.

Scenario: RHEL 8.8 VMs in Azure cloud. Sysmon for Linux 1.3.3.

Since Friday afternoon (10 January 2025), memory usage for Sysmon has been perfectly limited through systemd service. Sysmon is behaving well and we achieved results better than we expected.

In the meantime, in will be interesting to see when open-source community will resolve memory leak in new version of Sysmon (we have an open case with Microsoft which told us that the expected fix will happen around 2025 Q3).

  1. Originally, we set these values:

systemctl set-property sysmon.service MemoryHigh=8G MemoryMax=10G

I proposed that it would not be good to have such values on VMs with smaller amount of memory (8 GB) as it would mean that half of it would be wasted for monitoring.

  1. We then tested two other options, before settling down on the most reasonable one in third option:

systemctl set-property sysmon.service MemoryMax=1%

systemctl set-property sysmon.service MemoryMax=0.1%

systemctl set-property sysmon.service MemoryMax=100M

  1. Systemd documentation was telling us that this was what we should experience:

• The process should be killed if it reaches MemoryMax.

• If it passes MemoryHigh then it may be suspended. The problem is that if the process reaches MemoryHigh then it may never reach MemoryMax. If the process is not actually releasing memory (memory leak) then it will just be mostly-suspended forever.

• If one wants the service to die if it uses a specific amount of memory, then do not specify MemoryHigh at all, only MemoryMax.

  1. However, once we set up a limit of 100 MB for Sysmon for MemoryMax, we achieved something even better.

Since Friday afternoon, Sysmon simply hovers below or at value of 100 MB for memory usage. Once it hits 100 MB threshold, it “cleans up” automatically, frees some memory and continues to run without any service restart. Here is one example:

Fri Jan 10 11:25:06 AEDT 2025
MemoryCurrent=104591360
MemoryMax=104857600
Fri Jan 10 11:25:07 AEDT 2025
MemoryCurrent=104591360
MemoryMax=104857600
Fri Jan 10 11:25:08 AEDT 2025
MemoryCurrent=104857600
MemoryMax=104857600
Fri Jan 10 11:25:09 AEDT 2025
MemoryCurrent=104595456
MemoryMax=104857600
Fri Jan 10 11:25:10 AEDT 2025
MemoryCurrent=104599552
MemoryMax=104857600

One can collect such data with:

while true; do date && systemctl show sysmon.service | grep -E "MemoryCurrent=|MemoryMax="; sleep 1; done

  1. As a comparison, on another VM, which still does not have regular Sysmon service restarts, the memory usage is growing so much that the VM might soon crash if no preventative actions are taken before hand (set systemd limit for MemoryMax, or, schedule regular restarts of Sysmon, or reboot VM). We are using that VM for collecting info for Microsoft. I created a small script to collect Sysmon memory usage every 10 minutes:

Mon Dec 16 13:04:56 AEDT 2024 ... 27.0% /opt/sysmon/sysmon

Mon Jan 13 07:35:02 AEDT 2025 ... 71.9% /opt/sysmon/sysmon

There is high confidence that there is a memory leak. We already had several incidents with Sysmon taking down VMs…

@MarioHewardt
Copy link
Collaborator

Hi all - Thanks for your patience as we've worked through this issue. The fix has been merged and it would be great if someone could verify that their scenarios work before we go ahead and publish new packages.

@mag37
Copy link

mag37 commented Jan 17, 2025

Hi all - Thanks for your patience as we've worked through this issue. The fix has been merged and it would be great if someone could verify that their scenarios work before we go ahead and publish new packages.

Did two builds today based on the #191 commit. One on AlmaLinux8 and one on AlmaLinux9 and compiled RPMs for both systems (don't know if they actually differ - but wanted it clean).

Pushed it to a machine (rhel8) that had major memory leak issues to test it over the weekend - will report results.

I don't know what should be considered acceptable memory usage - but earlier this week we had that same machine go from fresh restart 200-250Mb res.ram and 0 swap to 24h later use 900+Mb res.ram and 500+Mb swap.

Did a bunch of testing on this machine earlier, came to the same conclusion and solution as @vk2cot with setting MemoryMax=350M and MemorySwapMax=400M to be a workaround, together with Restart=on-failure that'll force restart the service when it reaches the thresholds.

@mag37
Copy link

mag37 commented Jan 19, 2025

Some results from this weekend, not very detailed - only one machine and only a few measurements. Checking the resident ram and swap. The first measurement is moments after a service restart.

time ram swap
17/1 20:30 115M 0M
18/1 12:30 793M 0M
19/1 09:30 982M 854M
19/1 20:30 899M 1443M

Restarted the service after this and set up a log every 15 minutes to monitor the next few days.

This is a quite busy machine and the one we've seen most issues with, filling up the swap fastest.
But it does sadly look like we're facing the same issues as before the patch.

@MarioHewardt
Copy link
Collaborator

Thanks for testing this out, appreciate it. This might be a different issue. I was able to reproduce the original issue using Valgrind and after the fix Valgrind ran clean. Is it possible for you to run with Valgrind enabled (as per above instructions)?

@MarioHewardt
Copy link
Collaborator

Also, in addition to a valgrind run, if you could share the sysmon config you are using that'd would be helpful.

@mag37
Copy link

mag37 commented Jan 27, 2025

Sadly I don't have the option to run Valgrind on that particular machine, while I tried to run it in a test environment I couldn't get Valgrind to output more than just at startup.

And sorry to say but we cant really share the config either - it's rather complex and filled with sensitive data in a secure environment.

Currently this is all I can provide, this is measurements over a few days and it's pretty clear that it just keeps growing and swapping out ram:

Image

Sorry that I can't give more debug logs right now, hopefully someone else can provide something more tangible.

@MarioHewardt
Copy link
Collaborator

Thanks for getting back to us. Did you try following the valgrind instructions that the issue creator wrote down? That should enable valgrind to trace eBPF programs. Also, could you create a new issue for your specific leak? While the fix for the original issue may not fix your specific leak, we have do have a fix for the original issue that we plan on releasing here shortly.

@MarioHewardt
Copy link
Collaborator

Closing this issue as the original memory leak has been addressed in 1.3.4. If you encounter other leaks, please open separate issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants