Skip to content

Commit

Permalink
mm/oom_kill: count global and memory cgroup oom kills
Browse files Browse the repository at this point in the history
Show count of oom killer invocations in /proc/vmstat and count of
processes killed in memory cgroup in knob "memory.events" (in
memory.oom_control for v1 cgroup).

Also describe difference between "oom" and "oom_kill" in memory cgroup
documentation.  Currently oom in memory cgroup kills tasks iff shortage
has happened inside page fault.

These counters helps in monitoring oom kills - for now the only way is
grepping for magic words in kernel log.

[[email protected]: fix for mem_cgroup_count_vm_event() rename]
[[email protected]: fix comment, per Konstantin]
Link: http://lkml.kernel.org/r/149570810989.203600.9492483715840752937.stgit@buzz
Signed-off-by: Konstantin Khlebnikov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Roman Guschin <[email protected]>
Cc: David Rientjes <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
  • Loading branch information
koct9i authored and torvalds committed Jul 6, 2017
1 parent 2262185 commit 8e675f7
Show file tree
Hide file tree
Showing 6 changed files with 29 additions and 5 deletions.
20 changes: 16 additions & 4 deletions Documentation/cgroup-v2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -852,13 +852,25 @@ PAGE_SIZE multiple when read back.

The number of times the cgroup's memory usage was
about to go over the max boundary. If direct reclaim
fails to bring it down, the OOM killer is invoked.
fails to bring it down, the cgroup goes to OOM state.

oom

The number of times the OOM killer has been invoked in
the cgroup. This may not exactly match the number of
processes killed but should generally be close.
The number of time the cgroup's memory usage was
reached the limit and allocation was about to fail.

Depending on context result could be invocation of OOM
killer and retrying allocation or failing alloction.

Failed allocation in its turn could be returned into
userspace as -ENOMEM or siletly ignored in cases like
disk readahead. For now OOM in memory cgroup kills
tasks iff shortage has happened inside page fault.

oom_kill

The number of processes belonging to this cgroup
killed by any kind of OOM killer.

memory.stat

Expand Down
5 changes: 4 additions & 1 deletion include/linux/memcontrol.h
Original file line number Diff line number Diff line change
Expand Up @@ -582,8 +582,11 @@ static inline void count_memcg_event_mm(struct mm_struct *mm,

rcu_read_lock();
memcg = mem_cgroup_from_task(rcu_dereference(mm->owner));
if (likely(memcg))
if (likely(memcg)) {
this_cpu_inc(memcg->stat->events[idx]);
if (idx == OOM_KILL)
cgroup_file_notify(&memcg->events_file);
}
rcu_read_unlock();
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
Expand Down
1 change: 1 addition & 0 deletions include/linux/vm_event_item.h
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
PAGEOUTRUN, PGROTATED,
DROP_PAGECACHE, DROP_SLAB,
OOM_KILL,
#ifdef CONFIG_NUMA_BALANCING
NUMA_PTE_UPDATES,
NUMA_HUGE_PTE_UPDATES,
Expand Down
2 changes: 2 additions & 0 deletions mm/memcontrol.c
Original file line number Diff line number Diff line change
Expand Up @@ -3573,6 +3573,7 @@ static int mem_cgroup_oom_control_read(struct seq_file *sf, void *v)

seq_printf(sf, "oom_kill_disable %d\n", memcg->oom_kill_disable);
seq_printf(sf, "under_oom %d\n", (bool)memcg->under_oom);
seq_printf(sf, "oom_kill %lu\n", memcg_sum_events(memcg, OOM_KILL));
return 0;
}

Expand Down Expand Up @@ -5164,6 +5165,7 @@ static int memory_events_show(struct seq_file *m, void *v)
seq_printf(m, "high %lu\n", memcg_sum_events(memcg, MEMCG_HIGH));
seq_printf(m, "max %lu\n", memcg_sum_events(memcg, MEMCG_MAX));
seq_printf(m, "oom %lu\n", memcg_sum_events(memcg, MEMCG_OOM));
seq_printf(m, "oom_kill %lu\n", memcg_sum_events(memcg, OOM_KILL));

return 0;
}
Expand Down
5 changes: 5 additions & 0 deletions mm/oom_kill.c
Original file line number Diff line number Diff line change
Expand Up @@ -876,6 +876,11 @@ static void oom_kill_process(struct oom_control *oc, const char *message)
/* Get a reference to safely compare mm after task_unlock(victim) */
mm = victim->mm;
mmgrab(mm);

/* Raise event before sending signal: task reaper must see this */
count_vm_event(OOM_KILL);
count_memcg_event_mm(mm, OOM_KILL);

/*
* We should send SIGKILL before setting TIF_MEMDIE in order to prevent
* the OOM victim from depleting the memory reserves from the user
Expand Down
1 change: 1 addition & 0 deletions mm/vmstat.c
Original file line number Diff line number Diff line change
Expand Up @@ -1018,6 +1018,7 @@ const char * const vmstat_text[] = {

"drop_pagecache",
"drop_slab",
"oom_kill",

#ifdef CONFIG_NUMA_BALANCING
"numa_pte_updates",
Expand Down

0 comments on commit 8e675f7

Please sign in to comment.