This repo contains MLNS for small values of N, as well as the generator used to generate them.
A N-sorter1 is a circuit that, given N input values, returns them sorted.
A 2-sorter, commonly used in sorting networks, is a special case of N-sorter;
a 2-sorter implements the expression {o0, o1} = i0 < i1 ? {i0, i1} : {i1, i0}
.
A MLNS is an N-sorter that aims at ensuring that the sorting happens with the smallest possible latency. Specifically, the latency virtually does not depend on the number of inputs N2, as all comparisons are done in parallel (instead of in serial stages like they are done in sorting networks).
To generate the N-sorter for a specifc value of N
, run: ruby mlnsgen/mlnsgen.rb <N>
(e.g. ruby mlnsgen/mlnsgen.rb 4
) or ruby mlnsgen/mlnsgenfast.rb <N>
.
The current algorithm performs an exhaustive search (i.e. uses superlinear time), so it is too slow for values of N much higher than 8 (and furthermore the resulting Verilog code that implements such wide N-sorters becomes impractically large for synthesis tools to handle efficiently).
This generator directly synthesizes the cases. It can be used to generate larger sorters (but keep in mind that sorters larger than 10 are going to result in gigabytes of Verilog code).
The files nsorter_*.v
contain the generated n-sorters.
Just to provide a rough idea, these are the currently available generated n-sorters and their characteristics:
N | Comparisons | Orderings | Gates3 | Area (µm²) |
---|---|---|---|---|
2 | 1 | 2 | 447 | 6906.64 |
3 | 3 | 6 | 1736 | 14589.04 |
4 | 6 | 24 | 3543 | 27781.64 |
5 | 10 | 120 | 6366 | 61711.75 |
6 | 15 | 720 | 12551 | ? |
7 | 21 | 5040 | 41387 | ? |
8 | 28 | 40320 | 250448 | ? |
9 | 36 | 362880 | ? | ? |
... | ... | ... | ... | ... |
? | ? |
SiliconCompiler 0.30.0 yields the following for the smaller n-sorters:
Footnotes
-
Sometimes also called k-sorter. ↩
-
Specifically, the number of stages does not depend on the number of inputs, like it does in sorting networks, because there is a single comparison stage and a single selection stage regardless of the number of inputs N, and the latency of the comparison stage is constant regardless of number of inputs N. The selection stage though does contain N-way mux, so the latency of the selection stage is proportional to log2(N). ↩
-
SiliconCompiler 0.30.0, skywater130 process, density 50; 64 bit values. ↩ ↩2 ↩3
-
SiliconCompiler 0.30.0, skywater130 process, density 40; 64 bit values. ↩