Skip to content

Commit

Permalink
Merge branch 'akpm' (Andrew's patch-bomb)
Browse files Browse the repository at this point in the history
Merge misc VM changes from Andrew Morton:
 "The rest of most-of-MM.  The other MM bits await a slab merge.

  This patch includes the addition of a huge zero_page.  Not a
  performance boost but it an save large amounts of physical memory in
  some situations.

  Also a bunch of Fujitsu engineers are working on memory hotplug.
  Which, as it turns out, was badly broken.  About half of their patches
  are included here; the remainder are 3.8 material."

However, this merge disables CONFIG_MOVABLE_NODE, which was totally
broken.  We don't add new features with "default y", nor do we add
Kconfig questions that are incomprehensible to most people without any
help text.  Does the feature even make sense without compaction or
memory hotplug?

* akpm: (54 commits)
  mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()
  mm/memory.c: remove unused code from do_wp_page()
  asm-generic, mm: pgtable: consolidate zero page helpers
  mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage
  hwpoison, hugetlbfs: fix RSS-counter warning
  hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage
  mm: protect against concurrent vma expansion
  memcg: do not check for mm in __mem_cgroup_count_vm_event
  tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
  mm: provide more accurate estimation of pages occupied by memmap
  fs/buffer.c: remove redundant initialization in alloc_page_buffers()
  fs/buffer.c: do not inline exported function
  writeback: fix a typo in comment
  mm: introduce new field "managed_pages" to struct zone
  mm, oom: remove statically defined arch functions of same name
  mm, oom: remove redundant sleep in pagefault oom handler
  mm, oom: cleanup pagefault oom handler
  memory_hotplug: allow online/offline memory to result movable node
  numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
  mm, memcg: avoid unnecessary function call when memcg is disabled
  ...
  • Loading branch information
torvalds committed Dec 13, 2012
2 parents 193c0d6 + 9887090 commit f6e858a
Show file tree
Hide file tree
Showing 52 changed files with 996 additions and 422 deletions.
2 changes: 1 addition & 1 deletion Documentation/cgroups/cpusets.txt
Original file line number Diff line number Diff line change
Expand Up @@ -218,7 +218,7 @@ and name space for cpusets, with a minimum of additional kernel code.
The cpus and mems files in the root (top_cpuset) cpuset are
read-only. The cpus file automatically tracks the value of
cpu_online_mask using a CPU hotplug notifier, and the mems file
automatically tracks the value of node_states[N_HIGH_MEMORY]--i.e.,
automatically tracks the value of node_states[N_MEMORY]--i.e.,
nodes with memory--using the cpuset_track_online_nodes() hook.


Expand Down
5 changes: 4 additions & 1 deletion Documentation/memory-hotplug.txt
Original file line number Diff line number Diff line change
Expand Up @@ -390,14 +390,17 @@ struct memory_notify {
unsigned long start_pfn;
unsigned long nr_pages;
int status_change_nid_normal;
int status_change_nid_high;
int status_change_nid;
}

start_pfn is start_pfn of online/offline memory.
nr_pages is # of pages of online/offline memory.
status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
is (will be) set/clear, if this is -1, then nodemask status is not changed.
status_change_nid is set node id when N_HIGH_MEMORY of nodemask is (will be)
status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
is (will be) set/clear, if this is -1, then nodemask status is not changed.
status_change_nid is set node id when N_MEMORY of nodemask is (will be)
set/clear. It means a new(memoryless) node gets new memory by online and a
node loses all memory. If this is -1, then nodemask status is not changed.
If status_changed_nid* >= 0, callback should create/discard structures for the
Expand Down
19 changes: 17 additions & 2 deletions Documentation/vm/transhuge.txt
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,13 @@ echo always >/sys/kernel/mm/transparent_hugepage/defrag
echo madvise >/sys/kernel/mm/transparent_hugepage/defrag
echo never >/sys/kernel/mm/transparent_hugepage/defrag

By default kernel tries to use huge zero page on read page fault.
It's possible to disable huge zero page by writing 0 or enable it
back by writing 1:

echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page
echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page

khugepaged will be automatically started when
transparent_hugepage/enabled is set to "always" or "madvise, and it'll
be automatically shutdown if it's set to "never".
Expand Down Expand Up @@ -197,6 +204,14 @@ thp_split is incremented every time a huge page is split into base
pages. This can happen for a variety of reasons but a common
reason is that a huge page is old and is being reclaimed.

thp_zero_page_alloc is incremented every time a huge zero page is
successfully allocated. It includes allocations which where
dropped due race with other allocation. Note, it doesn't count
every map of the huge zero page, only its allocation.

thp_zero_page_alloc_failed is incremented if kernel fails to allocate
huge zero page and falls back to using small pages.

As the system ages, allocating huge pages may be expensive as the
system uses memory compaction to copy data around memory to free a
huge page for use. There are some counters in /proc/vmstat to help
Expand Down Expand Up @@ -276,7 +291,7 @@ unaffected. libhugetlbfs will also work fine as usual.
== Graceful fallback ==

Code walking pagetables but unware about huge pmds can simply call
split_huge_page_pmd(mm, pmd) where the pmd is the one returned by
split_huge_page_pmd(vma, addr, pmd) where the pmd is the one returned by
pmd_offset. It's trivial to make the code transparent hugepage aware
by just grepping for "pmd_offset" and adding split_huge_page_pmd where
missing after pmd_offset returns the pmd. Thanks to the graceful
Expand All @@ -299,7 +314,7 @@ diff --git a/mm/mremap.c b/mm/mremap.c
return NULL;

pmd = pmd_offset(pud, addr);
+ split_huge_page_pmd(mm, pmd);
+ split_huge_page_pmd(vma, addr, pmd);
if (pmd_none_or_clear_bad(pmd))
return NULL;

Expand Down
11 changes: 1 addition & 10 deletions arch/mips/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -76,16 +76,7 @@ extern unsigned long zero_page_mask;

#define ZERO_PAGE(vaddr) \
(virt_to_page((void *)(empty_zero_page + (((unsigned long)(vaddr)) & zero_page_mask))))

#define is_zero_pfn is_zero_pfn
static inline int is_zero_pfn(unsigned long pfn)
{
extern unsigned long zero_pfn;
unsigned long offset_from_zero_pfn = pfn - zero_pfn;
return offset_from_zero_pfn <= (zero_page_mask >> PAGE_SHIFT);
}

#define my_zero_pfn(addr) page_to_pfn(ZERO_PAGE(addr))
#define __HAVE_COLOR_ZERO_PAGE

extern void paging_init(void);

Expand Down
27 changes: 12 additions & 15 deletions arch/powerpc/mm/fault.c
Original file line number Diff line number Diff line change
Expand Up @@ -113,19 +113,6 @@ static int store_updates_sp(struct pt_regs *regs)
#define MM_FAULT_CONTINUE -1
#define MM_FAULT_ERR(sig) (sig)

static int out_of_memory(struct pt_regs *regs)
{
/*
* We ran out of memory, or some other thing happened to us that made
* us unable to handle the page fault gracefully.
*/
up_read(&current->mm->mmap_sem);
if (!user_mode(regs))
return MM_FAULT_ERR(SIGKILL);
pagefault_out_of_memory();
return MM_FAULT_RETURN;
}

static int do_sigbus(struct pt_regs *regs, unsigned long address)
{
siginfo_t info;
Expand Down Expand Up @@ -169,8 +156,18 @@ static int mm_fault_error(struct pt_regs *regs, unsigned long addr, int fault)
return MM_FAULT_CONTINUE;

/* Out of memory */
if (fault & VM_FAULT_OOM)
return out_of_memory(regs);
if (fault & VM_FAULT_OOM) {
up_read(&current->mm->mmap_sem);

/*
* We ran out of memory, or some other thing happened to us that
* made us unable to handle the page fault gracefully.
*/
if (!user_mode(regs))
return MM_FAULT_ERR(SIGKILL);
pagefault_out_of_memory();
return MM_FAULT_RETURN;
}

/* Bus error. x86 handles HWPOISON here, we'll add this if/when
* we support the feature in HW
Expand Down
11 changes: 1 addition & 10 deletions arch/s390/include/asm/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,16 +55,7 @@ extern unsigned long zero_page_mask;
#define ZERO_PAGE(vaddr) \
(virt_to_page((void *)(empty_zero_page + \
(((unsigned long)(vaddr)) &zero_page_mask))))

#define is_zero_pfn is_zero_pfn
static inline int is_zero_pfn(unsigned long pfn)
{
extern unsigned long zero_pfn;
unsigned long offset_from_zero_pfn = pfn - zero_pfn;
return offset_from_zero_pfn <= (zero_page_mask >> PAGE_SHIFT);
}

#define my_zero_pfn(addr) page_to_pfn(ZERO_PAGE(addr))
#define __HAVE_COLOR_ZERO_PAGE

#endif /* !__ASSEMBLY__ */

Expand Down
19 changes: 7 additions & 12 deletions arch/sh/mm/fault.c
Original file line number Diff line number Diff line change
Expand Up @@ -301,17 +301,6 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
__bad_area(regs, error_code, address, SEGV_ACCERR);
}

static void out_of_memory(void)
{
/*
* We ran out of memory, call the OOM killer, and return the userspace
* (which will retry the fault, or kill us if we got oom-killed):
*/
up_read(&current->mm->mmap_sem);

pagefault_out_of_memory();
}

static void
do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address)
{
Expand Down Expand Up @@ -353,8 +342,14 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
no_context(regs, error_code, address);
return 1;
}
up_read(&current->mm->mmap_sem);

out_of_memory();
/*
* We ran out of memory, call the OOM killer, and return the
* userspace (which will retry the fault, or kill us if we got
* oom-killed):
*/
pagefault_out_of_memory();
} else {
if (fault & VM_FAULT_SIGBUS)
do_sigbus(regs, error_code, address);
Expand Down
2 changes: 1 addition & 1 deletion arch/x86/kernel/vm86_32.c
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ static void mark_screen_rdonly(struct mm_struct *mm)
if (pud_none_or_clear_bad(pud))
goto out;
pmd = pmd_offset(pud, 0xA0000);
split_huge_page_pmd(mm, pmd);
split_huge_page_pmd_mm(mm, 0xA0000, pmd);
if (pmd_none_or_clear_bad(pmd))
goto out;
pte = pte_offset_map_lock(mm, pmd, 0xA0000, &ptl);
Expand Down
23 changes: 8 additions & 15 deletions arch/x86/mm/fault.c
Original file line number Diff line number Diff line change
Expand Up @@ -803,20 +803,6 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
__bad_area(regs, error_code, address, SEGV_ACCERR);
}

/* TODO: fixup for "mm-invoke-oom-killer-from-page-fault.patch" */
static void
out_of_memory(struct pt_regs *regs, unsigned long error_code,
unsigned long address)
{
/*
* We ran out of memory, call the OOM killer, and return the userspace
* (which will retry the fault, or kill us if we got oom-killed):
*/
up_read(&current->mm->mmap_sem);

pagefault_out_of_memory();
}

static void
do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address,
unsigned int fault)
Expand Down Expand Up @@ -879,7 +865,14 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code,
return 1;
}

out_of_memory(regs, error_code, address);
up_read(&current->mm->mmap_sem);

/*
* We ran out of memory, call the OOM killer, and return the
* userspace (which will retry the fault, or kill us if we got
* oom-killed):
*/
pagefault_out_of_memory();
} else {
if (fault & (VM_FAULT_SIGBUS|VM_FAULT_HWPOISON|
VM_FAULT_HWPOISON_LARGE))
Expand Down
4 changes: 3 additions & 1 deletion arch/x86/mm/init_64.c
Original file line number Diff line number Diff line change
Expand Up @@ -630,7 +630,9 @@ void __init paging_init(void)
* numa support is not compiled in, and later node_set_state
* will not set it back.
*/
node_clear_state(0, N_NORMAL_MEMORY);
node_clear_state(0, N_MEMORY);
if (N_MEMORY != N_NORMAL_MEMORY)
node_clear_state(0, N_NORMAL_MEMORY);

zone_sizes_init();
}
Expand Down
8 changes: 7 additions & 1 deletion drivers/base/node.c
Original file line number Diff line number Diff line change
Expand Up @@ -227,7 +227,7 @@ static node_registration_func_t __hugetlb_unregister_node;
static inline bool hugetlb_register_node(struct node *node)
{
if (__hugetlb_register_node &&
node_state(node->dev.id, N_HIGH_MEMORY)) {
node_state(node->dev.id, N_MEMORY)) {
__hugetlb_register_node(node);
return true;
}
Expand Down Expand Up @@ -643,6 +643,9 @@ static struct node_attr node_state_attr[] = {
[N_NORMAL_MEMORY] = _NODE_ATTR(has_normal_memory, N_NORMAL_MEMORY),
#ifdef CONFIG_HIGHMEM
[N_HIGH_MEMORY] = _NODE_ATTR(has_high_memory, N_HIGH_MEMORY),
#endif
#ifdef CONFIG_MOVABLE_NODE
[N_MEMORY] = _NODE_ATTR(has_memory, N_MEMORY),
#endif
[N_CPU] = _NODE_ATTR(has_cpu, N_CPU),
};
Expand All @@ -653,6 +656,9 @@ static struct attribute *node_state_attrs[] = {
&node_state_attr[N_NORMAL_MEMORY].attr.attr,
#ifdef CONFIG_HIGHMEM
&node_state_attr[N_HIGH_MEMORY].attr.attr,
#endif
#ifdef CONFIG_MOVABLE_NODE
&node_state_attr[N_MEMORY].attr.attr,
#endif
&node_state_attr[N_CPU].attr.attr,
NULL
Expand Down
6 changes: 1 addition & 5 deletions fs/buffer.c
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,7 @@ static int fsync_buffers_list(spinlock_t *lock, struct list_head *list);

#define BH_ENTRY(list) list_entry((list), struct buffer_head, b_assoc_buffers)

inline void
init_buffer(struct buffer_head *bh, bh_end_io_t *handler, void *private)
void init_buffer(struct buffer_head *bh, bh_end_io_t *handler, void *private)
{
bh->b_end_io = handler;
bh->b_private = private;
Expand Down Expand Up @@ -850,13 +849,10 @@ struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
if (!bh)
goto no_grow;

bh->b_bdev = NULL;
bh->b_this_page = head;
bh->b_blocknr = -1;
head = bh;

bh->b_state = 0;
atomic_set(&bh->b_count, 0);
bh->b_size = size;

/* Link the buffer to its page */
Expand Down
2 changes: 1 addition & 1 deletion fs/fs-writeback.c
Original file line number Diff line number Diff line change
Expand Up @@ -1034,7 +1034,7 @@ int bdi_writeback_thread(void *data)
while (!kthread_freezable_should_stop(NULL)) {
/*
* Remove own delayed wake-up timer, since we are already awake
* and we'll take care of the preriodic write-back.
* and we'll take care of the periodic write-back.
*/
del_timer(&wb->wakeup_timer);

Expand Down
2 changes: 1 addition & 1 deletion fs/proc/kcore.c
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ static int kcore_update_ram(void)
/* Not inialized....update now */
/* find out "max pfn" */
end_pfn = 0;
for_each_node_state(nid, N_HIGH_MEMORY) {
for_each_node_state(nid, N_MEMORY) {
unsigned long node_end;
node_end = NODE_DATA(nid)->node_start_pfn +
NODE_DATA(nid)->node_spanned_pages;
Expand Down
6 changes: 3 additions & 3 deletions fs/proc/task_mmu.c
Original file line number Diff line number Diff line change
Expand Up @@ -643,7 +643,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
spinlock_t *ptl;
struct page *page;

split_huge_page_pmd(walk->mm, pmd);
split_huge_page_pmd(vma, addr, pmd);
if (pmd_trans_unstable(pmd))
return 0;

Expand Down Expand Up @@ -1126,7 +1126,7 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
return NULL;

nid = page_to_nid(page);
if (!node_isset(nid, node_states[N_HIGH_MEMORY]))
if (!node_isset(nid, node_states[N_MEMORY]))
return NULL;

return page;
Expand Down Expand Up @@ -1279,7 +1279,7 @@ static int show_numa_map(struct seq_file *m, void *v, int is_pid)
if (md->writeback)
seq_printf(m, " writeback=%lu", md->writeback);

for_each_node_state(n, N_HIGH_MEMORY)
for_each_node_state(n, N_MEMORY)
if (md->node[n])
seq_printf(m, " N%d=%lu", n, md->node[n]);
out:
Expand Down
26 changes: 26 additions & 0 deletions include/asm-generic/pgtable.h
Original file line number Diff line number Diff line change
Expand Up @@ -449,6 +449,32 @@ extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
unsigned long size);
#endif

#ifdef __HAVE_COLOR_ZERO_PAGE
static inline int is_zero_pfn(unsigned long pfn)
{
extern unsigned long zero_pfn;
unsigned long offset_from_zero_pfn = pfn - zero_pfn;
return offset_from_zero_pfn <= (zero_page_mask >> PAGE_SHIFT);
}

static inline unsigned long my_zero_pfn(unsigned long addr)
{
return page_to_pfn(ZERO_PAGE(addr));
}
#else
static inline int is_zero_pfn(unsigned long pfn)
{
extern unsigned long zero_pfn;
return pfn == zero_pfn;
}

static inline unsigned long my_zero_pfn(unsigned long addr)
{
extern unsigned long zero_pfn;
return zero_pfn;
}
#endif

#ifdef CONFIG_MMU

#ifndef CONFIG_TRANSPARENT_HUGEPAGE
Expand Down
3 changes: 0 additions & 3 deletions include/linux/bootmem.h
Original file line number Diff line number Diff line change
Expand Up @@ -137,9 +137,6 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
#define alloc_bootmem_low_pages_node(pgdat, x) \
__alloc_bootmem_low_node(pgdat, x, PAGE_SIZE, 0)

extern int reserve_bootmem_generic(unsigned long addr, unsigned long size,
int flags);

#ifdef CONFIG_HAVE_ARCH_ALLOC_REMAP
extern void *alloc_remap(int nid, unsigned long size);
#else
Expand Down
2 changes: 1 addition & 1 deletion include/linux/cpuset.h
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ static inline nodemask_t cpuset_mems_allowed(struct task_struct *p)
return node_possible_map;
}

#define cpuset_current_mems_allowed (node_states[N_HIGH_MEMORY])
#define cpuset_current_mems_allowed (node_states[N_MEMORY])
static inline void cpuset_init_current_mems_allowed(void) {}

static inline int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask)
Expand Down
Loading

0 comments on commit f6e858a

Please sign in to comment.