Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inaccurate RAM usage detection with ZFS #125

Open
Chloe-ko opened this issue Oct 14, 2024 · 2 comments
Open

Inaccurate RAM usage detection with ZFS #125

Chloe-ko opened this issue Oct 14, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@Chloe-ko
Copy link

Chloe-ko commented Oct 14, 2024

Hello,

first off I want to start this off with saying I absolutely love Komodo, it's by far the best Docker(-Compose) Deployment tool I've ever used! Thank you for this software!

Now to the issue, Komodo is persistently and non-stop giving me this Alert:
firefox_jTM2U6ZByI

I knew this was wrong and that I am not using this much memory, so I went to investigate.
To my surprise free -h actually shows the same usage!
WindowsTerminal_zsEmIoDCvw
Which at first had me worried, but then following this up by checking with htop and btop, they suddenly reported different (and more within reason) usages:
ApplicationFrameHost_4vgqZXAHEi

Following this I went to investigate and eventually found the reason for this discrepancy: ZFS, the file system
ZFS is very memory-intensive as it reaches higher capacities (and I have a total of 82TB of storage using ZFS) as it uses what's called ZFS ARC. This is a cache that is kept in-memory.

Just like normal Linux, ZFS ARC separates it's RAM usage into "Used" and "Available" - Used RAM being Memory that it actually needs, while Available Memory is Memory that is being used but ZFS can release to the user at any given time if needed.
(I'll assume you already know this topic, but if not, here's a small tl;dr: https://www.linuxatemyram.com/)

I confirmed this using btop which actually has a setting for this:
WindowsTerminal_dhZYO4d5Jz

While this is set to true, btop shows the "correct" RAM Usage of 64GB Used, 61GB Available.
Setting this to false makes btop reflect what Komodo and free-h show: 119GB used, ~6GB available.

I double checked and with my other servers, where ZFS is not in use, Komodo correctly only considers the used RAM used, so this is confirmed to be a ZFS specific issue.

Generally speaking, Available RAM should be considered "free" RAM from a user perspective.
It'd be great if this behavior could be adjusted as otherwise I have an alert active 24/7, thank you :)

@mbecker20 mbecker20 added the bug Something isn't working label Oct 14, 2024
@mbecker20
Copy link
Member

Thanks for the detailed report!

Currently the used_memory comes from this method: https://docs.rs/sysinfo/0.32.0/sysinfo/struct.System.html#method.used_memory

It seems this just take total - free (not available) memory. I agree that from a user perspective, available memory is usually the important metric, not free memory. So in 1.15.9 the used_memory will come from total_memory - available_memory. This should resolve the open alert issue if available mem reports more than free mem.

Since some user may care about the free_memory as well, I've also added another field mem_free_gb. This gives all the data available:

  • Free mem: mem_free_gb
  • Available mem: mem_total_gb - mem_used_gb
  • Used mem: mem_used_gb (or mem_total_gb - mem_free_gb)
  • Total mem: mem_total_gb

@mbecker20 mbecker20 added done 1.15.9 and removed bug Something isn't working labels Oct 15, 2024
@Chloe-ko
Copy link
Author

Chloe-ko commented Oct 15, 2024

Hey, thanks for the quick action!

Sadly I have to report that this didn't help in my specific case.

As you can see here, it still thinks that my Server is on 90%+ RAM usage:
firefox_iuCBykSvd2

And just for sanity checking, at the same time, btop reports:
WindowsTerminal_Zf3Qeaka25

This is a very ZFS-specific issue, though, and I assume not an issue for most users.

I did some more research and this issue seems to be with the fact that most filesystems just use the linux built-in file caching, which the kernel can properly differentiate between available and used.
ZFS implements its own file caching in memory (the thing called ARC), but this results in the kernel seeing all the memory that ZFS uses as "used" and not as available, regardless of whether ZFS can free up the memory or not.

The only way I've found to work around this seems to be having the application itself manually check the ZFS ARC memory and subtract it - there doesn't seem to be a way to get this info just from built-in commands that only check the kernel, as the kernel doesn't know about ZFS-internals.

I've found this which gives some valuable hints: https://superuser.com/a/1137417
I've tried checking the kernel stats from that answer and can confirm that cat /proc/spl/kstat/zfs/arcstats does output info about the ARC state even from inside the komodo/periphery container, so this is a resource that could be used whether in a bare-metal install or a containerized deployment.

The most valuable part of the output seems to be the fields c, c_min, c_max and size.
The superuser answer already elaborates on most of these:

  • c is the target size of the ARC in bytes
  • c_max is the maximum size of the ARC in bytes
  • size is the current size of the ARC in bytes

However, the answer doesn't explain c_min, though as the name implies, that's the minimum size that the ARC cache will shrink to. c_min is, essentially, the "used" memory for ZFS, as this seems to be the amount of memory that ZFS will not free up for the user.

"available" memory for ZFS could be calculated as size - c_min. (the amount of memory it won't free up substracted from the total amount of memory used = the amount of memory it will free up, as per my understanding)

Using awk the values can be extracted in gigabytes like this (The output from arcstats is initially in bytes, the / 1073741824 converts it to Gigabytes, of course if the initial calculation should be in bytes before the conversion to GB that can be removed):
Current size of the entire ZFS ARC (zfs_size_gb): awk '/^size/ { print $3 / 1073741824 }' < /proc/spl/kstat/zfs/arcstats
Minimum size of ARC (zfs_min_c_gb): awk '/^c_min/ { print $3 / 1073741824 }' < /proc/spl/kstat/zfs/arcstats

Given that, the available memory for a machine could be calculated as:
mem_total_gb - (mem_used_gb - zfs_size_gb + zfs_min_c_gb)
or
mem_total_gb - mem_used_gb + zfs_size_gb - zfs_min_c_gb

Of course, it would be reasonable to first check if zfs is in use (by checking if /proc/spl/kstat/zfs/arcstats exists at all), but I think this would make sense to implement given that ZFS is a regularly used filesystem.

@mbecker20 mbecker20 added bug Something isn't working and removed done 1.15.9 labels Oct 18, 2024
@mbecker20 mbecker20 changed the title Inaccurate RAM usage detection Inaccurate RAM usage detection with ZFS Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants