Skip to content

Commit

Permalink
add: yespower-1.0.1
Browse files Browse the repository at this point in the history
  • Loading branch information
decryp2kanon committed Mar 8, 2020
1 parent f9ef30f commit ffd26ad
Show file tree
Hide file tree
Showing 14 changed files with 3,626 additions and 0 deletions.
18 changes: 18 additions & 0 deletions yespower-1.0.1/CHANGES
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Changes made between 1.0.0 (2018/07/12) and 1.0.1 (2019/06/30).

Fill the destination buffer with all set bits on error for fail-safety
of the caller's "< target" check in case the caller neglects to check
for errors.

Simplified SMix2 for its final invocation with Nloop=2 in yespower 0.5.

Revised the "XOR of yespower" tests to trigger duplicate index in the
last SMix2 invocation in yespower 0.5 for N=2048 with at least one of
the values of r being tested. This is needed to test that a proper
kind of BlockMix is used in that special case, which would previously be
left untested.

Added x32 ABI support (x86-64 with 32-bit pointers).

Added a bit more detail to the README on caching of the computed PoW
hashes when integrating yespower in an altcoin based on Bitcoin Core.
95 changes: 95 additions & 0 deletions yespower-1.0.1/PERFORMANCE
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
Included with yespower is the "benchmark" program, which is built by
simply invoking "make". When invoked without parameters, it tests
yespower 0.5 at N = 2048, r = 8, which appears to be the lowest setting
in use by existing cryptocurrencies. On an i7-4770K with 4x DDR3-1600
(on two memory channels) running CentOS 7 for x86-64 (and built with
CentOS 7's default version of gcc) and with thread affinity set, this
reports between 3700 and 3800 hashes per second for both SSE2 and AVX
builds, e.g.:

$ GOMP_CPU_AFFINITY=0-7 OMP_NUM_THREADS=4 ./benchmark
version=0.5 N=2048 r=8
Will use 2048.00 KiB RAM
a5 9f ec 4c 4f dd a1 6e 3b 14 05 ad da 66 d5 25 b6 8e 7c ad fc fe 6a c0 66 c7 ad 11 8c d8 05 90
Benchmarking 1 thread ...
1018 H/s real, 1018 H/s virtual (2047 hashes in 2.01 seconds)
Benchmarking 4 threads ...
3773 H/s real, 950 H/s virtual (8188 hashes in 2.17 seconds)
min 0.984 ms, avg 1.052 ms, max 1.074 ms

Running 8 threads (to match the logical rather than the physical CPU
core count) results in very slightly worse performance on this system,
but this might be the other way around on another and/or with other
parameters. Upgrading to yespower 1.0, performance at these parameters
improves to almost 4000 hashes per second:

$ GOMP_CPU_AFFINITY=0-7 OMP_NUM_THREADS=4 ./benchmark 10
version=1.0 N=2048 r=8
Will use 2048.00 KiB RAM
d0 78 cd d4 cf 3f 5a a8 4e 3c 4a 58 66 29 81 d8 2d 27 e5 67 36 37 c4 be 77 63 61 32 24 c1 8a 93
Benchmarking 1 thread ...
1080 H/s real, 1080 H/s virtual (4095 hashes in 3.79 seconds)
Benchmarking 4 threads ...
3995 H/s real, 1011 H/s virtual (16380 hashes in 4.10 seconds)
min 0.923 ms, avg 0.989 ms, max 1.137 ms

Running 8 threads results in substantial slowdown with this new version
(to between 3200 and 3400 hashes per second) because of cache thrashing.

For higher settings such as those achieving 8 MiB instead of the 2 MiB
above, this system performs at around 800 hashes per second for yespower
0.5 and at around 830 hashes per second for yespower 1.0:

$ GOMP_CPU_AFFINITY=0-7 OMP_NUM_THREADS=4 ./benchmark 5 2048 32
version=0.5 N=2048 r=32
Will use 8192.00 KiB RAM
56 0a 89 1b 5c a2 e1 c6 36 11 1a 9f f7 c8 94 a5 d0 a2 60 2f 43 fd cf a5 94 9b 95 e2 2f e4 46 1e
Benchmarking 1 thread ...
265 H/s real, 265 H/s virtual (1023 hashes in 3.85 seconds)
Benchmarking 4 threads ...
803 H/s real, 200 H/s virtual (4092 hashes in 5.09 seconds)
min 4.924 ms, avg 4.980 ms, max 5.074 ms

$ GOMP_CPU_AFFINITY=0-7 OMP_NUM_THREADS=4 ./benchmark 10 2048 32
version=1.0 N=2048 r=32
Will use 8192.00 KiB RAM
f7 69 26 ae 4a dc 56 53 90 2f f0 22 78 ea aa 39 eb 99 84 11 ac 3e a6 24 2e 19 6d fb c4 3d 68 25
Benchmarking 1 thread ...
275 H/s real, 275 H/s virtual (1023 hashes in 3.71 seconds)
Benchmarking 4 threads ...
831 H/s real, 209 H/s virtual (4092 hashes in 4.92 seconds)
min 3.614 ms, avg 4.769 ms, max 5.011 ms

Again, running 8 threads results in a slowdown, albeit not as bad as can
be seen for lower settings.

On x86(-64), the following code versions may reasonably be built: SSE2,
AVX, and XOP. (There's no reason to build for AVX2 and higher, which is
unsuitable for and thus unused by current yespower anyway. There's also
no reason to build yespower as-is for SSE4, although there's a disabled
by default 32-bit specific SSE4 code version that may be re-enabled and
given a try if someone is so inclined; it may perform slightly slower or
slightly faster across different systems.)

yescrypt and especially yespower 1.0 have been designed to fit the SSE2
instruction set almost perfectly, so there's very little benefit from
the AVX and XOP builds, yet even at yespower 1.0 there may be
performance differences between SSE2, AVX, and XOP builds within 2% or
so (and it is unclear which is the fastest on a given system until
tested, except that where XOP is supported it is almost always faster
than AVX).

Proper setting of thread affinities to run exactly one thread per
physical CPU core is non-trivial. In the above examples, it so happened
that the first 4 logical CPU numbers corresponded to different physical
cores, but this won't always be the case. This can vary even between
apparently similar systems. On Linux, the mapping of logical CPUs to
physical cores may be obtained from /proc/cpuinfo (on x86[-64] and MIC)
or sysfs, which an optimized implementation of e.g. a cryptocurrency
miner could use. If you do not bother obtaining this information from
the operating system, you might be better off not setting thread
affinities at all (in order to avoid the risk of doing this incorrectly,
which would have a greater negative performance impact) and/or running
as many threads as there are logical CPUs. Also, there's no certainty
whether different and future CPUs will run yespower faster using one or
maybe more threads per physical core.
203 changes: 203 additions & 0 deletions yespower-1.0.1/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
What is yespower?

yespower is a proof-of-work (PoW) focused fork of yescrypt. While
yescrypt is a password-based key derivation function (KDF) and password
hashing scheme, and thus is meant for processing passwords, yespower is
meant for processing trial inputs such as block headers (including
nonces) in PoW-based blockchains.

On its own, yespower isn't a complete proof-of-work system. Rather, in
the blockchain use case, yespower's return value is meant to be checked
for being numerically no greater than the blockchain's current target
(which is related to mining difficulty) or else the proof attempt
(yespower invocation) is to be repeated (with a different nonce) until
the condition is finally met (allowing a new block to be mined). This
process isn't specific to yespower and isn't part of yespower itself
(rather, it is similar in many PoW-based blockchains and is to be
defined and implemented externally to yespower) and thus isn't described
in here any further.


Why or why not yespower?

Different proof-of-work schemes in existence vary in many aspects,
including in friendliness to different types of hardware. There's
demand for all sorts of hardware (un)friendliness in those - for
different use cases and by different communities.

yespower in particular is designed to be CPU-friendly, GPU-unfriendly,
and FPGA/ASIC-neutral. In other words, it's meant to be relatively
efficient to compute on current CPUs and relatively inefficient on
current GPUs. Unfortunately, being GPU-unfriendly also means that
eventual FPGA and ASIC implementations will only compete with CPUs, and
at least ASICs will win over the CPUs (FPGAs might not because of this
market's peculiarities - large FPGAs are even more "over-priced" than
large CPUs are), albeit by far not to the extent they did e.g. for
Bitcoin and Litecoin.

There's a lot of talk about "ASIC resistance". What is (or should be)
meant by that is limiting the advantage of specialized ASICs. While
limiting the advantage at KDF to e.g. 10x and at password hashing to
e.g. 100x (talking orders of magnitude here, in whatever terms) may be
considered "ASIC resistant" (as compared to e.g. 100,000x we'd have
without trying), similar improvement factors are practically not "ASIC
resistant" for cryptocurrency mining where they can make all the
difference between CPU mining being profitable and not. There might
also exist in-between PoW use cases where moderate ASIC advantage is OK,
such as with non-cryptocurrency and/or private/permissioned blockchains.

Thus, current yespower may be considered either a short-term choice
(valid until one of its uses provides sufficient perceived incentive to
likely result in specialized ASICs) or a deliberate choice of a pro-CPU,
anti-GPU, moderately-pro-ASIC PoW scheme. It is also possible to
respond to known improvements in future GPUs/implementations and/or to
ASICs with new versions of yespower that users would need to switch to.


yespower versions.

yespower includes optimized and specialized re-implementation of the
obsolete yescrypt 0.5 (based off its first submission to Password
Hashing Competition back in 2014) now re-released as yespower 0.5, and
brand new proof-of-work specific variation known as yespower 1.0.

yespower 0.5 is intended as a compatible upgrade for cryptocurrencies
that already use yescrypt 0.5 (providing a few percent speedup), and
yespower 1.0 may be used as a further upgrade or a new choice of PoW by
those and other cryptocurrencies and other projects.

There are many significant differences between yespower 0.5 and 1.0
under the hood, but the main user visible difference is yespower 1.0
greatly improving on GPU-unfriendliness in light of improvements seen in
modern GPUs (up to and including NVIDIA Volta) and GPU implementations
of yescrypt 0.5. This is achieved mostly through greater use of CPUs'
L2 cache.

The version of algorithm to use is requested through parameters,
allowing for both algorithms to co-exist in client and miner
implementations (such as in preparation for a cryptocurrency hard-fork
and/or supporting multiple cryptocurrencies in one program).


Parameter selection.

For new uses of yespower, set the requested version to the highest
supported, and set N*r to the highest you can reasonably afford in terms
of proof verification time (which might in turn be determined by desired
share rate per mining pool server), using one of the following options:

1 MiB: N = 1024, r = 8
2 MiB: N = 2048, r = 8
4 MiB: N = 1024, r = 32
8 MiB: N = 2048, r = 32
16 MiB: N = 4096, r = 32

and so on for higher N keeping r=32.

You may also set the personalization string to your liking, but that is
not required (you can set its pointer to NULL and its length to 0). Its
support is provided mostly for compatibility with existing modifications
of yescrypt 0.5.


Performance.

Please refer to PERFORMANCE for some benchmarks and performance tuning.


How to test yespower for proper operation.

On a Unix-like system, invoke "make check". This will build and run a
program called "tests", and check its output against the supplied file
TESTS-OK. If everything matches, the final line of output should be the
word "PASSED".

We do most of our testing on Linux systems with gcc. The supplied
Makefile assumes that you use gcc.


Alternate code versions and make targets.

Two implementations of yespower are included: reference and optimized.
By default, the optimized implementation is built. Internally, the
optimized implementation uses conditional compilation to choose between
usage of various SIMD instruction sets where supported and scalar code.

The reference implementation is unoptimized and is very slow, but it has
simpler and shorter source code. Its purpose is to provide a simple
human- and machine-readable specification that implementations intended
for actual use should be tested against. It is deliberately mostly not
optimized, and it is not meant to be used in production.

Similarly to "make check", there's "make check-ref" to build and test
the reference implementation. There's also "make ref" to build the
reference implementation and have the "benchmark" program use it.

"make clean" may need to be run between making different builds.


How to integrate yespower in a program.

Although yespower.h provides several functions, chances are that you
will only need to use yespower_tls(). Please see the comment on this
function in yespower.h and its example usage in tests.c and benchmark.c,
including parameter sets requesting yescrypt 0.5 as used by certain
existing cryptocurrencies.

To integrate yespower in an altcoin based on Bitcoin Core, you might
invoke yespower_tls() from either a maybe-new (depending on where you
fork from) CBlockHeader::GetPoWHash() (and invoke that where PoW is
needed like e.g. Litecoin does for scrypt) or CBlockHeader::GetHash()
(easier, but inefficient and you'll be stuck with that inefficiency).

You'll also want to implement caching of the computed PoW hashes like
e.g. YACoin does for scrypt. Caching is especially important if you
invoke yespower from CBlockHeader::GetHash(). However, even if you use
or introduce CBlockHeader::GetPoWHash() caching may still be desirable
as the PoW hash is commonly requested 4 times per block fetched during a
node's initial blockchain sync (once during prefetch of block headers,
and 3 times more during validation of a fully fetched block). On the
other hand, you'll likely want to bypass the cache when PoW is computed
by the node's built-in miner.

Further detail on this (generating new genesis blocks, etc.) is even
farther from being yespower-specific and thus is not provided here.
Just like (and even more so than) yespower itself, the above guidance is
provided as-is and without guarantee of being correct and safe to
follow. You're supposed to know what you're doing.


Credits.

scrypt has been designed by Colin Percival. yescrypt and yespower have
been designed by Solar Designer building upon scrypt.

The following other people and projects have also indirectly helped make
yespower what it is:

- Bill Cox
- Rich Felker
- Anthony Ferrara
- Christian Forler
- Taylor Hornby
- Dmitry Khovratovich
- Samuel Neves
- Marcos Simplicio
- Ken T Takusagawa
- Jakob Wenzel
- Christian Winnerlein

- DARPA Cyber Fast Track
- Password Hashing Competition


Contact info.

First, please check the yespower homepage for new versions, etc.:

https://www.openwall.com/yespower/

If you have anything valuable to add or a non-trivial question to ask,
you may contact the maintainer of yespower at:

Solar Designer <solar at openwall.com>
16 changes: 16 additions & 0 deletions yespower-1.0.1/TESTS-OK
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
yespower(5, 2048, 8, "Client Key") = a5 9f ec 4c 4f dd a1 6e 3b 14 05 ad da 66 d5 25 b6 8e 7c ad fc fe 6a c0 66 c7 ad 11 8c d8 05 90
yespower(5, 2048, 8, BSTY) = 5e a2 b2 95 6a 9e ac e3 0a 32 37 ff 1d 44 1e de e1 dc 25 aa b8 f0 ea 15 c1 21 65 f8 3a 7b c2 65
yespower(5, 4096, 16, "Client Key") = 92 7e 72 d0 de d3 d8 04 75 47 3f 40 f1 74 3c 67 28 9d 45 3d 52 42 d4 f5 5a f4 e3 25 e0 66 99 c5
yespower(5, 4096, 24, "Jagaricoin") = 0e 13 66 97 32 11 e7 fe a8 ad 9d 81 98 9c 84 a2 54 d9 68 c9 d3 33 dd 8f f0 99 32 4f 38 61 1e 04
yespower(5, 4096, 32, "WaviBanana") = 3a e0 5a bb 3c 5c f6 f7 54 15 a9 25 54 c9 8d 50 e3 8e c9 55 2c fa 78 37 36 16 f4 80 b2 4e 55 9f
yespower(5, 2048, 32, "Client Key") = 56 0a 89 1b 5c a2 e1 c6 36 11 1a 9f f7 c8 94 a5 d0 a2 60 2f 43 fd cf a5 94 9b 95 e2 2f e4 46 1e
yespower(5, 1024, 32, "Client Key") = 2a 79 e5 3d 1b e6 66 9b c5 56 cc c4 17 bc e3 d2 2a 74 a2 32 f5 6b 8e 1d 39 b4 57 92 67 5d e1 08
yespower(5, 2048, 8, NULL) = 5e cb d8 e8 d7 c9 0b ae d4 bb f8 91 6a 12 25 dc c3 c6 5f 5c 91 65 ba e8 1c dd e3 cf fa d1 28 e8
yespower(10, 2048, 8, NULL) = 69 e0 e8 95 b3 df 7a ee b8 37 d7 1f e1 99 e9 d3 4f 7e c4 6e cb ca 7a 2c 43 08 e5 18 57 ae 9b 46
yespower(10, 4096, 16, NULL) = 33 fb 8f 06 38 24 a4 a0 20 f6 3d ca 53 5f 5c a6 6a b5 57 64 68 c7 5d 1c ca ac 75 42 f7 64 95 ac
yespower(10, 4096, 32, NULL) = 77 1a ee fd a8 fe 79 a0 82 5b c7 f2 ae e1 62 ab 55 78 57 46 39 ff c6 ca 37 23 cc 18 e5 e3 e2 85
yespower(10, 2048, 32, NULL) = d5 ef b8 13 cd 26 3e 9b 34 54 01 30 23 3c bb c6 a9 21 fb ff 34 31 e5 ec 1a 1a bd e2 ae a6 ff 4d
yespower(10, 1024, 32, NULL) = 50 1b 79 2d b4 2e 38 8f 6e 7d 45 3c 95 d0 3a 12 a3 60 16 a5 15 4a 68 83 90 dd c6 09 a4 0c 67 99
yespower(10, 1024, 32, "personality test") = 1f 02 69 ac f5 65 c4 9a dc 0e f9 b8 f2 6a b3 80 8c dc 38 39 4a 25 4f dd ee dc c3 aa cf f6 ad 9d
XOR of yespower(5, ...) = ae f1 32 91 87 0f 55 70 47 f4 2e 9b ef a6 16 df e5 f1 96 77 e1 3f 8b a6 92 f7 c5 97 55 a0 f5 0e
XOR of yespower(10, ...) = 8d 13 c5 fb 07 30 96 75 d1 b8 48 92 77 ba 4b e4 40 33 be df ae 7a 60 43 8a 9b e2 1f 3a 7b 12 37
Loading

0 comments on commit ffd26ad

Please sign in to comment.