- O(N * log(N)) time
- O(sqrt(N)) memory (with constant pointer size)
- adaptive (fast for mostly pre-sorted inputs)
- stable (order of equal elements is unchanged)
The core idea is:
- Divide the data into blocks of sqrt(N) size, and sort each block.
- Use sqrt(N) indices to track which block is where.
- Clear some blocks by merging data into scratch memory.
- Merge blocks of data from where they are into whichever blocks are empty.
Here's an example of merging blocks:
-
The data is divided into blocks.
aaaabbbb
-
Merging of {a, b} into {C} starts. The blocks of {a, b} are merged into a buffer, starting with the first ones. When blocks are used completely, they're cleared in the main array. Here, {b} is first in the sort, so blocks of {b} are cleared first.
aaaa__bb
-
The merged run data {C} is written into the space left by cleared {b} blocks. The location indices are stored in a locations array. Here, locations[i] is the index where the ith output data block can be found.
aaaaCC_b
45______
-
Here, {a, b} are fully merged into {C} and the locations are stored. If another merge is done, blocks are used from their current positions.
CCCCCCCC
45670123
-
Once all data has been merged, the blocks are sorted according to their location indices.
For large arrays, compared to std::stable_sort with unlimited memory, this implementation takes:
- ~1.1x the time for random inputs
- ~0.1x the time for sorted inputs
Here are some times I got from running the included code.
The block locations are fixed while elements move between them, like books being moved between shelves. Also, "block sort" was taken.
Most sorting algorithms are either unstable, slow, or require O(N) memory. There are merge sorts using binary search to partition blocks to reduce memory consumption, such as blitsort, but this is a different approach that doesn't use binary search.
- It's an intellectual challenge that's "pure" in a way that I like.
- It's a way to test yourself against some of the best minds of history.
- If you can develop something situationally useful, you can leave some small mark on the world.
- I'm currently unemployed, and the weather's been bad, so I had some free time.
- It will probably get HR people to not throw out my resume. <- this is sarcasm
It's written in low-level C++ and does some branchless operations, but apart from that, it's not very optimized. Things I haven't done include:
- trying initial sorts for small segments that handle >4 elements
- looking at the assembly code
- manual vectorization
Micro-optimization isn't my specialty or my interest, but feel free to try doing some.
- To simplify implementation and discourage people from using this code in production, it currently only works on power-of-2 array sizes.
- It hasn't been tested thoroughly, and could have bugs on some inputs.
I also programmed:
- a fork of a roguelike game
- a proof of concept for a new hashtable algorithm
b αt bhauth dot com