Skip to content

Commit

Permalink
Jemalloc updated to 4.0.3.
Browse files Browse the repository at this point in the history
  • Loading branch information
antirez committed Oct 6, 2015
1 parent e3ded02 commit a9951b1
Show file tree
Hide file tree
Showing 137 changed files with 19,880 additions and 10,021 deletions.
3 changes: 3 additions & 0 deletions deps/jemalloc/.autom4te.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
begin-language: "Autoconf-without-aclocal-m4"
args: --no-cache
end-language: "Autoconf-without-aclocal-m4"
1 change: 1 addition & 0 deletions deps/jemalloc/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
* text=auto eol=lf
7 changes: 5 additions & 2 deletions deps/jemalloc/.gitignore
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
/*.gcov.*

/autom4te.cache/

/bin/jemalloc-config
/bin/jemalloc.sh
/bin/jeprof

/config.stamp
/config.log
Expand All @@ -15,6 +15,8 @@
/doc/jemalloc.html
/doc/jemalloc.3

/jemalloc.pc

/lib/

/Makefile
Expand All @@ -35,6 +37,7 @@
/include/jemalloc/jemalloc_protos.h
/include/jemalloc/jemalloc_protos_jet.h
/include/jemalloc/jemalloc_rename.h
/include/jemalloc/jemalloc_typedefs.h

/src/*.[od]
/src/*.gcda
Expand Down
4 changes: 2 additions & 2 deletions deps/jemalloc/COPYING
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Unless otherwise specified, files in the jemalloc source distribution are
subject to the following license:
--------------------------------------------------------------------------------
Copyright (C) 2002-2014 Jason Evans <[email protected]>.
Copyright (C) 2002-2015 Jason Evans <[email protected]>.
All rights reserved.
Copyright (C) 2007-2012 Mozilla Foundation. All rights reserved.
Copyright (C) 2009-2014 Facebook, Inc. All rights reserved.
Copyright (C) 2009-2015 Facebook, Inc. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
Expand Down
250 changes: 245 additions & 5 deletions deps/jemalloc/ChangeLog
Original file line number Diff line number Diff line change
@@ -1,10 +1,250 @@
Following are change highlights associated with official releases. Important
bug fixes are all mentioned, but internal enhancements are omitted here for
brevity (even though they are more fun to write about). Much more detail can be
found in the git revision history:
bug fixes are all mentioned, but some internal enhancements are omitted here for
brevity. Much more detail can be found in the git revision history:

https://github.com/jemalloc/jemalloc

* 4.0.3 (September 24, 2015)

This bugfix release continues the trend of xallocx() and heap profiling fixes.

Bug fixes:
- Fix xallocx(..., MALLOCX_ZERO) to zero all trailing bytes of large
allocations when --enable-cache-oblivious configure option is enabled.
- Fix xallocx(..., MALLOCX_ZERO) to zero trailing bytes of huge allocations
when resizing from/to a size class that is not a multiple of the chunk size.
- Fix prof_tctx_dump_iter() to filter out nodes that were created after heap
profile dumping started.
- Work around a potentially bad thread-specific data initialization
interaction with NPTL (glibc's pthreads implementation).

* 4.0.2 (September 21, 2015)

This bugfix release addresses a few bugs specific to heap profiling.

Bug fixes:
- Fix ixallocx_prof_sample() to never modify nor create sampled small
allocations. xallocx() is in general incapable of moving small allocations,
so this fix removes buggy code without loss of generality.
- Fix irallocx_prof_sample() to always allocate large regions, even when
alignment is non-zero.
- Fix prof_alloc_rollback() to read tdata from thread-specific data rather
than dereferencing a potentially invalid tctx.

* 4.0.1 (September 15, 2015)

This is a bugfix release that is somewhat high risk due to the amount of
refactoring required to address deep xallocx() problems. As a side effect of
these fixes, xallocx() now tries harder to partially fulfill requests for
optional extra space. Note that a couple of minor heap profiling
optimizations are included, but these are better thought of as performance
fixes that were integral to disovering most of the other bugs.

Optimizations:
- Avoid a chunk metadata read in arena_prof_tctx_set(), since it is in the
fast path when heap profiling is enabled. Additionally, split a special
case out into arena_prof_tctx_reset(), which also avoids chunk metadata
reads.
- Optimize irallocx_prof() to optimistically update the sampler state. The
prior implementation appears to have been a holdover from when
rallocx()/xallocx() functionality was combined as rallocm().

Bug fixes:
- Fix TLS configuration such that it is enabled by default for platforms on
which it works correctly.
- Fix arenas_cache_cleanup() and arena_get_hard() to handle
allocation/deallocation within the application's thread-specific data
cleanup functions even after arenas_cache is torn down.
- Fix xallocx() bugs related to size+extra exceeding HUGE_MAXCLASS.
- Fix chunk purge hook calls for in-place huge shrinking reallocation to
specify the old chunk size rather than the new chunk size. This bug caused
no correctness issues for the default chunk purge function, but was
visible to custom functions set via the "arena.<i>.chunk_hooks" mallctl.
- Fix heap profiling bugs:
+ Fix heap profiling to distinguish among otherwise identical sample sites
with interposed resets (triggered via the "prof.reset" mallctl). This bug
could cause data structure corruption that would most likely result in a
segfault.
+ Fix irealloc_prof() to prof_alloc_rollback() on OOM.
+ Make one call to prof_active_get_unlocked() per allocation event, and use
the result throughout the relevant functions that handle an allocation
event. Also add a missing check in prof_realloc(). These fixes protect
allocation events against concurrent prof_active changes.
+ Fix ixallocx_prof() to pass usize_max and zero to ixallocx_prof_sample()
in the correct order.
+ Fix prof_realloc() to call prof_free_sampled_object() after calling
prof_malloc_sample_object(). Prior to this fix, if tctx and old_tctx were
the same, the tctx could have been prematurely destroyed.
- Fix portability bugs:
+ Don't bitshift by negative amounts when encoding/decoding run sizes in
chunk header maps. This affected systems with page sizes greater than 8
KiB.
+ Rename index_t to szind_t to avoid an existing type on Solaris.
+ Add JEMALLOC_CXX_THROW to the memalign() function prototype, in order to
match glibc and avoid compilation errors when including both
jemalloc/jemalloc.h and malloc.h in C++ code.
+ Don't assume that /bin/sh is appropriate when running size_classes.sh
during configuration.
+ Consider __sparcv9 a synonym for __sparc64__ when defining LG_QUANTUM.
+ Link tests to librt if it contains clock_gettime(2).

* 4.0.0 (August 17, 2015)

This version contains many speed and space optimizations, both minor and
major. The major themes are generalization, unification, and simplification.
Although many of these optimizations cause no visible behavior change, their
cumulative effect is substantial.

New features:
- Normalize size class spacing to be consistent across the complete size
range. By default there are four size classes per size doubling, but this
is now configurable via the --with-lg-size-class-group option. Also add the
--with-lg-page, --with-lg-page-sizes, --with-lg-quantum, and
--with-lg-tiny-min options, which can be used to tweak page and size class
settings. Impacts:
+ Worst case performance for incrementally growing/shrinking reallocation
is improved because there are far fewer size classes, and therefore
copying happens less often.
+ Internal fragmentation is limited to 20% for all but the smallest size
classes (those less than four times the quantum). (1B + 4 KiB)
and (1B + 4 MiB) previously suffered nearly 50% internal fragmentation.
+ Chunk fragmentation tends to be lower because there are fewer distinct run
sizes to pack.
- Add support for explicit tcaches. The "tcache.create", "tcache.flush", and
"tcache.destroy" mallctls control tcache lifetime and flushing, and the
MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to the *allocx() API
control which tcache is used for each operation.
- Implement per thread heap profiling, as well as the ability to
enable/disable heap profiling on a per thread basis. Add the "prof.reset",
"prof.lg_sample", "thread.prof.name", "thread.prof.active",
"opt.prof_thread_active_init", "prof.thread_active_init", and
"thread.prof.active" mallctls.
- Add support for per arena application-specified chunk allocators, configured
via the "arena.<i>.chunk_hooks" mallctl.
- Refactor huge allocation to be managed by arenas, so that arenas now
function as general purpose independent allocators. This is important in
the context of user-specified chunk allocators, aside from the scalability
benefits. Related new statistics:
+ The "stats.arenas.<i>.huge.allocated", "stats.arenas.<i>.huge.nmalloc",
"stats.arenas.<i>.huge.ndalloc", and "stats.arenas.<i>.huge.nrequests"
mallctls provide high level per arena huge allocation statistics.
+ The "arenas.nhchunks", "arenas.hchunk.<i>.size",
"stats.arenas.<i>.hchunks.<j>.nmalloc",
"stats.arenas.<i>.hchunks.<j>.ndalloc",
"stats.arenas.<i>.hchunks.<j>.nrequests", and
"stats.arenas.<i>.hchunks.<j>.curhchunks" mallctls provide per size class
statistics.
- Add the 'util' column to malloc_stats_print() output, which reports the
proportion of available regions that are currently in use for each small
size class.
- Add "alloc" and "free" modes for for junk filling (see the "opt.junk"
mallctl), so that it is possible to separately enable junk filling for
allocation versus deallocation.
- Add the jemalloc-config script, which provides information about how
jemalloc was configured, and how to integrate it into application builds.
- Add metadata statistics, which are accessible via the "stats.metadata",
"stats.arenas.<i>.metadata.mapped", and
"stats.arenas.<i>.metadata.allocated" mallctls.
- Add the "stats.resident" mallctl, which reports the upper limit of
physically resident memory mapped by the allocator.
- Add per arena control over unused dirty page purging, via the
"arenas.lg_dirty_mult", "arena.<i>.lg_dirty_mult", and
"stats.arenas.<i>.lg_dirty_mult" mallctls.
- Add the "prof.gdump" mallctl, which makes it possible to toggle the gdump
feature on/off during program execution.
- Add sdallocx(), which implements sized deallocation. The primary
optimization over dallocx() is the removal of a metadata read, which often
suffers an L1 cache miss.
- Add missing header includes in jemalloc/jemalloc.h, so that applications
only have to #include <jemalloc/jemalloc.h>.
- Add support for additional platforms:
+ Bitrig
+ Cygwin
+ DragonFlyBSD
+ iOS
+ OpenBSD
+ OpenRISC/or1k

Optimizations:
- Maintain dirty runs in per arena LRUs rather than in per arena trees of
dirty-run-containing chunks. In practice this change significantly reduces
dirty page purging volume.
- Integrate whole chunks into the unused dirty page purging machinery. This
reduces the cost of repeated huge allocation/deallocation, because it
effectively introduces a cache of chunks.
- Split the arena chunk map into two separate arrays, in order to increase
cache locality for the frequently accessed bits.
- Move small run metadata out of runs, into arena chunk headers. This reduces
run fragmentation, smaller runs reduce external fragmentation for small size
classes, and packed (less uniformly aligned) metadata layout improves CPU
cache set distribution.
- Randomly distribute large allocation base pointer alignment relative to page
boundaries in order to more uniformly utilize CPU cache sets. This can be
disabled via the --disable-cache-oblivious configure option, and queried via
the "config.cache_oblivious" mallctl.
- Micro-optimize the fast paths for the public API functions.
- Refactor thread-specific data to reside in a single structure. This assures
that only a single TLS read is necessary per call into the public API.
- Implement in-place huge allocation growing and shrinking.
- Refactor rtree (radix tree for chunk lookups) to be lock-free, and make
additional optimizations that reduce maximum lookup depth to one or two
levels. This resolves what was a concurrency bottleneck for per arena huge
allocation, because a global data structure is critical for determining
which arenas own which huge allocations.

Incompatible changes:
- Replace --enable-cc-silence with --disable-cc-silence to suppress spurious
warnings by default.
- Assure that the constness of malloc_usable_size()'s return type matches that
of the system implementation.
- Change the heap profile dump format to support per thread heap profiling,
rename pprof to jeprof, and enhance it with the --thread=<n> option. As a
result, the bundled jeprof must now be used rather than the upstream
(gperftools) pprof.
- Disable "opt.prof_final" by default, in order to avoid atexit(3), which can
internally deadlock on some platforms.
- Change the "arenas.nlruns" mallctl type from size_t to unsigned.
- Replace the "stats.arenas.<i>.bins.<j>.allocated" mallctl with
"stats.arenas.<i>.bins.<j>.curregs".
- Ignore MALLOC_CONF in set{uid,gid,cap} binaries.
- Ignore MALLOCX_ARENA(a) in dallocx(), in favor of using the
MALLOCX_TCACHE(tc) and MALLOCX_TCACHE_NONE flags to control tcache usage.

Removed features:
- Remove the *allocm() API, which is superseded by the *allocx() API.
- Remove the --enable-dss options, and make dss non-optional on all platforms
which support sbrk(2).
- Remove the "arenas.purge" mallctl, which was obsoleted by the
"arena.<i>.purge" mallctl in 3.1.0.
- Remove the unnecessary "opt.valgrind" mallctl; jemalloc automatically
detects whether it is running inside Valgrind.
- Remove the "stats.huge.allocated", "stats.huge.nmalloc", and
"stats.huge.ndalloc" mallctls.
- Remove the --enable-mremap option.
- Remove the "stats.chunks.current", "stats.chunks.total", and
"stats.chunks.high" mallctls.

Bug fixes:
- Fix the cactive statistic to decrease (rather than increase) when active
memory decreases. This regression was first released in 3.5.0.
- Fix OOM handling in memalign() and valloc(). A variant of this bug existed
in all releases since 2.0.0, which introduced these functions.
- Fix an OOM-related regression in arena_tcache_fill_small(), which could
cause cache corruption on OOM. This regression was present in all releases
from 2.2.0 through 3.6.0.
- Fix size class overflow handling for malloc(), posix_memalign(), memalign(),
calloc(), and realloc() when profiling is enabled.
- Fix the "arena.<i>.dss" mallctl to return an error if "primary" or
"secondary" precedence is specified, but sbrk(2) is not supported.
- Fix fallback lg_floor() implementations to handle extremely large inputs.
- Ensure the default purgeable zone is after the default zone on OS X.
- Fix latent bugs in atomic_*().
- Fix the "arena.<i>.dss" mallctl to handle read-only calls.
- Fix tls_model configuration to enable the initial-exec model when possible.
- Mark malloc_conf as a weak symbol so that the application can override it.
- Correctly detect glibc's adaptive pthread mutexes.
- Fix the --without-export configure option.

* 3.6.0 (March 31, 2014)

This version contains a critical bug fix for a regression present in 3.5.0 and
Expand All @@ -21,7 +261,7 @@ found in the git revision history:
backtracing to be reliable.
- Use dss allocation precedence for huge allocations as well as small/large
allocations.
- Fix test assertion failure message formatting. This bug did not manifect on
- Fix test assertion failure message formatting. This bug did not manifest on
x86_64 systems because of implementation subtleties in va_list.
- Fix inconsequential test failures for hash and SFMT code.

Expand Down Expand Up @@ -516,7 +756,7 @@ found in the git revision history:
- Make it possible for the application to manually flush a thread's cache, via
the "tcache.flush" mallctl.
- Base maximum dirty page count on proportion of active memory.
- Compute various addtional run-time statistics, including per size class
- Compute various additional run-time statistics, including per size class
statistics for large objects.
- Expose malloc_stats_print(), which can be called repeatedly by the
application.
Expand Down
Loading

0 comments on commit a9951b1

Please sign in to comment.