CGROUP aware resource monitor on memory #38718

wujiaqi · 2025-03-12T17:06:57Z

Title: Add a GROUP aware resource monitor for memory

Description:
I'm opening this issue to have a preliminary discussion on how to implement this. Someone on my team can do the implementation once we get agreement.

We have an Istio Ingress Gateway today and have overload manager configured to load shed on memory utilization thresholds. This is to prevent OOMKills of our pods especially during high load events. However the fixed_heap resource monitor that exists today only reports the memory that tcmalloc believes is allocated. OOMKills are based on what the OS sees and not what tcmalloc thinks so it is important to have a monitor that sees this accordingly. It is often the case that fixed_heap is substantially lower than what is reported in CGROUPS.

Below is an experiment I conducted to demonstrate the discrepancy

During Load
Docker stats

CONTAINER ID   NAME      CPU %     MEM USAGE / LIMIT   MEM %
2696a94996b9   envoy     50.56%     489.5MiB / 512MiB  95.61%

Envoy metric

overload.envoy.resource_monitors.fixed_heap.pressure: 87

After Load
Docker stats

CONTAINER ID   NAME      CPU %     MEM USAGE / LIMIT   MEM %
2696a94996b9   envoy     0.48%     343.1MiB / 512MiB   67.01%

Envoy metric

overload.envoy.resource_monitors.fixed_heap.pressure: 16

As you can see, heap pressure is much lower than the OS reported memory consumption.

I am proposing to add a new resource monitor for memory based on CGROUPS rather than tcmalloc stats. As there is a transition at the moment where some systems are CGROUPS v1 and others are CGROUPS v2, and some could be in hybrid mode, it would be worth abstracting this detail away in the configuration to just "cgroups enabled". During object construction we can detect in the system if it is CGROUPS v1 or v2. For example it can check the filesystem for presence of the hierarchies

if the following files are present then system is on cgroups v2

/sys/fs/cgroup/memory.max
/sys/fs/cgroup/memory.current

else if the following directory exists then system is on cgroups v1

/sys/fs/cgroup/memory

We will pick the highest available cgroups implementation on the system during construction.

Appreciate the feedback, thanks.

[optional Relevant Links:]

Any extra documentation required to understand the issue.
related issue #36681

cc @ramaraochavali

The text was updated successfully, but these errors were encountered:

botengyao · 2025-03-13T15:10:48Z

+@KBaichoo

Thanks @wujiaqi, this makes sense to me, and a cgroup version aware memory_utilization resource monitor can be added.

A cgroup based CPU resource monitor was added recently, and you can take a ref from there #34713

KBaichoo · 2025-03-13T15:47:54Z

Docker stats

CONTAINER ID   NAME      CPU %     MEM USAGE / LIMIT   MEM %
2696a94996b9   envoy     0.48%     343.1MiB / 512MiB   67.01%

Envoy metric

overload.envoy.resource_monitors.fixed_heap.pressure: 16

See also https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/bootstrap/v3/bootstrap.proto#config-bootstrap-v3-memoryallocatormanager as a way to configure giving back some memory to the OS.

A cgroup aware resource monitor sounds like a great enhancement!

wujiaqi · 2025-03-13T16:23:16Z

I did happen to test that as well, it works nicely. After an idle period the memory gets released.

CONTAINER ID   NAME      CPU %     MEM USAGE / LIMIT   MEM %
2696a94996b9   envoy     0.65%     87.49MiB / 512MiB   17.09%

overload.envoy.resource_monitors.fixed_heap.pressure: 14
tcmalloc.released_by_timer: 92

though what I don't understand is how to make a judgement call on the value to set for bytes_to_release. I tried finding some literature on the tcmalloc docs but it wasn't super clear to me, it gave some insights on memory fragmentation etc, I would appreciate any insight

memory_allocator_manager:
  bytes_to_release: 31460000 #arbitrarily chose 30MB

wujiaqi added enhancement triage labels Mar 12, 2025

wujiaqi mentioned this issue Mar 12, 2025

Add a Resource Monitor in Overload Manager that Tracks Memory PSI #36681

Open

botengyao added area/overload_manager and removed triage labels Mar 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CGROUP aware resource monitor on memory #38718

CGROUP aware resource monitor on memory #38718

wujiaqi commented Mar 12, 2025 •

edited

Loading

botengyao commented Mar 13, 2025

KBaichoo commented Mar 13, 2025

wujiaqi commented Mar 13, 2025

CGROUP aware resource monitor on memory #38718

CGROUP aware resource monitor on memory #38718

Comments

wujiaqi commented Mar 12, 2025 • edited Loading

botengyao commented Mar 13, 2025

KBaichoo commented Mar 13, 2025

wujiaqi commented Mar 13, 2025

wujiaqi commented Mar 12, 2025 •

edited

Loading