forked from yentencoin/yenten-arm-miner-yespowerr16
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
f9ef30f
commit ffd26ad
Showing
14 changed files
with
3,626 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
Changes made between 1.0.0 (2018/07/12) and 1.0.1 (2019/06/30). | ||
|
||
Fill the destination buffer with all set bits on error for fail-safety | ||
of the caller's "< target" check in case the caller neglects to check | ||
for errors. | ||
|
||
Simplified SMix2 for its final invocation with Nloop=2 in yespower 0.5. | ||
|
||
Revised the "XOR of yespower" tests to trigger duplicate index in the | ||
last SMix2 invocation in yespower 0.5 for N=2048 with at least one of | ||
the values of r being tested. This is needed to test that a proper | ||
kind of BlockMix is used in that special case, which would previously be | ||
left untested. | ||
|
||
Added x32 ABI support (x86-64 with 32-bit pointers). | ||
|
||
Added a bit more detail to the README on caching of the computed PoW | ||
hashes when integrating yespower in an altcoin based on Bitcoin Core. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
Included with yespower is the "benchmark" program, which is built by | ||
simply invoking "make". When invoked without parameters, it tests | ||
yespower 0.5 at N = 2048, r = 8, which appears to be the lowest setting | ||
in use by existing cryptocurrencies. On an i7-4770K with 4x DDR3-1600 | ||
(on two memory channels) running CentOS 7 for x86-64 (and built with | ||
CentOS 7's default version of gcc) and with thread affinity set, this | ||
reports between 3700 and 3800 hashes per second for both SSE2 and AVX | ||
builds, e.g.: | ||
|
||
$ GOMP_CPU_AFFINITY=0-7 OMP_NUM_THREADS=4 ./benchmark | ||
version=0.5 N=2048 r=8 | ||
Will use 2048.00 KiB RAM | ||
a5 9f ec 4c 4f dd a1 6e 3b 14 05 ad da 66 d5 25 b6 8e 7c ad fc fe 6a c0 66 c7 ad 11 8c d8 05 90 | ||
Benchmarking 1 thread ... | ||
1018 H/s real, 1018 H/s virtual (2047 hashes in 2.01 seconds) | ||
Benchmarking 4 threads ... | ||
3773 H/s real, 950 H/s virtual (8188 hashes in 2.17 seconds) | ||
min 0.984 ms, avg 1.052 ms, max 1.074 ms | ||
|
||
Running 8 threads (to match the logical rather than the physical CPU | ||
core count) results in very slightly worse performance on this system, | ||
but this might be the other way around on another and/or with other | ||
parameters. Upgrading to yespower 1.0, performance at these parameters | ||
improves to almost 4000 hashes per second: | ||
|
||
$ GOMP_CPU_AFFINITY=0-7 OMP_NUM_THREADS=4 ./benchmark 10 | ||
version=1.0 N=2048 r=8 | ||
Will use 2048.00 KiB RAM | ||
d0 78 cd d4 cf 3f 5a a8 4e 3c 4a 58 66 29 81 d8 2d 27 e5 67 36 37 c4 be 77 63 61 32 24 c1 8a 93 | ||
Benchmarking 1 thread ... | ||
1080 H/s real, 1080 H/s virtual (4095 hashes in 3.79 seconds) | ||
Benchmarking 4 threads ... | ||
3995 H/s real, 1011 H/s virtual (16380 hashes in 4.10 seconds) | ||
min 0.923 ms, avg 0.989 ms, max 1.137 ms | ||
|
||
Running 8 threads results in substantial slowdown with this new version | ||
(to between 3200 and 3400 hashes per second) because of cache thrashing. | ||
|
||
For higher settings such as those achieving 8 MiB instead of the 2 MiB | ||
above, this system performs at around 800 hashes per second for yespower | ||
0.5 and at around 830 hashes per second for yespower 1.0: | ||
|
||
$ GOMP_CPU_AFFINITY=0-7 OMP_NUM_THREADS=4 ./benchmark 5 2048 32 | ||
version=0.5 N=2048 r=32 | ||
Will use 8192.00 KiB RAM | ||
56 0a 89 1b 5c a2 e1 c6 36 11 1a 9f f7 c8 94 a5 d0 a2 60 2f 43 fd cf a5 94 9b 95 e2 2f e4 46 1e | ||
Benchmarking 1 thread ... | ||
265 H/s real, 265 H/s virtual (1023 hashes in 3.85 seconds) | ||
Benchmarking 4 threads ... | ||
803 H/s real, 200 H/s virtual (4092 hashes in 5.09 seconds) | ||
min 4.924 ms, avg 4.980 ms, max 5.074 ms | ||
|
||
$ GOMP_CPU_AFFINITY=0-7 OMP_NUM_THREADS=4 ./benchmark 10 2048 32 | ||
version=1.0 N=2048 r=32 | ||
Will use 8192.00 KiB RAM | ||
f7 69 26 ae 4a dc 56 53 90 2f f0 22 78 ea aa 39 eb 99 84 11 ac 3e a6 24 2e 19 6d fb c4 3d 68 25 | ||
Benchmarking 1 thread ... | ||
275 H/s real, 275 H/s virtual (1023 hashes in 3.71 seconds) | ||
Benchmarking 4 threads ... | ||
831 H/s real, 209 H/s virtual (4092 hashes in 4.92 seconds) | ||
min 3.614 ms, avg 4.769 ms, max 5.011 ms | ||
|
||
Again, running 8 threads results in a slowdown, albeit not as bad as can | ||
be seen for lower settings. | ||
|
||
On x86(-64), the following code versions may reasonably be built: SSE2, | ||
AVX, and XOP. (There's no reason to build for AVX2 and higher, which is | ||
unsuitable for and thus unused by current yespower anyway. There's also | ||
no reason to build yespower as-is for SSE4, although there's a disabled | ||
by default 32-bit specific SSE4 code version that may be re-enabled and | ||
given a try if someone is so inclined; it may perform slightly slower or | ||
slightly faster across different systems.) | ||
|
||
yescrypt and especially yespower 1.0 have been designed to fit the SSE2 | ||
instruction set almost perfectly, so there's very little benefit from | ||
the AVX and XOP builds, yet even at yespower 1.0 there may be | ||
performance differences between SSE2, AVX, and XOP builds within 2% or | ||
so (and it is unclear which is the fastest on a given system until | ||
tested, except that where XOP is supported it is almost always faster | ||
than AVX). | ||
|
||
Proper setting of thread affinities to run exactly one thread per | ||
physical CPU core is non-trivial. In the above examples, it so happened | ||
that the first 4 logical CPU numbers corresponded to different physical | ||
cores, but this won't always be the case. This can vary even between | ||
apparently similar systems. On Linux, the mapping of logical CPUs to | ||
physical cores may be obtained from /proc/cpuinfo (on x86[-64] and MIC) | ||
or sysfs, which an optimized implementation of e.g. a cryptocurrency | ||
miner could use. If you do not bother obtaining this information from | ||
the operating system, you might be better off not setting thread | ||
affinities at all (in order to avoid the risk of doing this incorrectly, | ||
which would have a greater negative performance impact) and/or running | ||
as many threads as there are logical CPUs. Also, there's no certainty | ||
whether different and future CPUs will run yespower faster using one or | ||
maybe more threads per physical core. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,203 @@ | ||
What is yespower? | ||
|
||
yespower is a proof-of-work (PoW) focused fork of yescrypt. While | ||
yescrypt is a password-based key derivation function (KDF) and password | ||
hashing scheme, and thus is meant for processing passwords, yespower is | ||
meant for processing trial inputs such as block headers (including | ||
nonces) in PoW-based blockchains. | ||
|
||
On its own, yespower isn't a complete proof-of-work system. Rather, in | ||
the blockchain use case, yespower's return value is meant to be checked | ||
for being numerically no greater than the blockchain's current target | ||
(which is related to mining difficulty) or else the proof attempt | ||
(yespower invocation) is to be repeated (with a different nonce) until | ||
the condition is finally met (allowing a new block to be mined). This | ||
process isn't specific to yespower and isn't part of yespower itself | ||
(rather, it is similar in many PoW-based blockchains and is to be | ||
defined and implemented externally to yespower) and thus isn't described | ||
in here any further. | ||
|
||
|
||
Why or why not yespower? | ||
|
||
Different proof-of-work schemes in existence vary in many aspects, | ||
including in friendliness to different types of hardware. There's | ||
demand for all sorts of hardware (un)friendliness in those - for | ||
different use cases and by different communities. | ||
|
||
yespower in particular is designed to be CPU-friendly, GPU-unfriendly, | ||
and FPGA/ASIC-neutral. In other words, it's meant to be relatively | ||
efficient to compute on current CPUs and relatively inefficient on | ||
current GPUs. Unfortunately, being GPU-unfriendly also means that | ||
eventual FPGA and ASIC implementations will only compete with CPUs, and | ||
at least ASICs will win over the CPUs (FPGAs might not because of this | ||
market's peculiarities - large FPGAs are even more "over-priced" than | ||
large CPUs are), albeit by far not to the extent they did e.g. for | ||
Bitcoin and Litecoin. | ||
|
||
There's a lot of talk about "ASIC resistance". What is (or should be) | ||
meant by that is limiting the advantage of specialized ASICs. While | ||
limiting the advantage at KDF to e.g. 10x and at password hashing to | ||
e.g. 100x (talking orders of magnitude here, in whatever terms) may be | ||
considered "ASIC resistant" (as compared to e.g. 100,000x we'd have | ||
without trying), similar improvement factors are practically not "ASIC | ||
resistant" for cryptocurrency mining where they can make all the | ||
difference between CPU mining being profitable and not. There might | ||
also exist in-between PoW use cases where moderate ASIC advantage is OK, | ||
such as with non-cryptocurrency and/or private/permissioned blockchains. | ||
|
||
Thus, current yespower may be considered either a short-term choice | ||
(valid until one of its uses provides sufficient perceived incentive to | ||
likely result in specialized ASICs) or a deliberate choice of a pro-CPU, | ||
anti-GPU, moderately-pro-ASIC PoW scheme. It is also possible to | ||
respond to known improvements in future GPUs/implementations and/or to | ||
ASICs with new versions of yespower that users would need to switch to. | ||
|
||
|
||
yespower versions. | ||
|
||
yespower includes optimized and specialized re-implementation of the | ||
obsolete yescrypt 0.5 (based off its first submission to Password | ||
Hashing Competition back in 2014) now re-released as yespower 0.5, and | ||
brand new proof-of-work specific variation known as yespower 1.0. | ||
|
||
yespower 0.5 is intended as a compatible upgrade for cryptocurrencies | ||
that already use yescrypt 0.5 (providing a few percent speedup), and | ||
yespower 1.0 may be used as a further upgrade or a new choice of PoW by | ||
those and other cryptocurrencies and other projects. | ||
|
||
There are many significant differences between yespower 0.5 and 1.0 | ||
under the hood, but the main user visible difference is yespower 1.0 | ||
greatly improving on GPU-unfriendliness in light of improvements seen in | ||
modern GPUs (up to and including NVIDIA Volta) and GPU implementations | ||
of yescrypt 0.5. This is achieved mostly through greater use of CPUs' | ||
L2 cache. | ||
|
||
The version of algorithm to use is requested through parameters, | ||
allowing for both algorithms to co-exist in client and miner | ||
implementations (such as in preparation for a cryptocurrency hard-fork | ||
and/or supporting multiple cryptocurrencies in one program). | ||
|
||
|
||
Parameter selection. | ||
|
||
For new uses of yespower, set the requested version to the highest | ||
supported, and set N*r to the highest you can reasonably afford in terms | ||
of proof verification time (which might in turn be determined by desired | ||
share rate per mining pool server), using one of the following options: | ||
|
||
1 MiB: N = 1024, r = 8 | ||
2 MiB: N = 2048, r = 8 | ||
4 MiB: N = 1024, r = 32 | ||
8 MiB: N = 2048, r = 32 | ||
16 MiB: N = 4096, r = 32 | ||
|
||
and so on for higher N keeping r=32. | ||
|
||
You may also set the personalization string to your liking, but that is | ||
not required (you can set its pointer to NULL and its length to 0). Its | ||
support is provided mostly for compatibility with existing modifications | ||
of yescrypt 0.5. | ||
|
||
|
||
Performance. | ||
|
||
Please refer to PERFORMANCE for some benchmarks and performance tuning. | ||
|
||
|
||
How to test yespower for proper operation. | ||
|
||
On a Unix-like system, invoke "make check". This will build and run a | ||
program called "tests", and check its output against the supplied file | ||
TESTS-OK. If everything matches, the final line of output should be the | ||
word "PASSED". | ||
|
||
We do most of our testing on Linux systems with gcc. The supplied | ||
Makefile assumes that you use gcc. | ||
|
||
|
||
Alternate code versions and make targets. | ||
|
||
Two implementations of yespower are included: reference and optimized. | ||
By default, the optimized implementation is built. Internally, the | ||
optimized implementation uses conditional compilation to choose between | ||
usage of various SIMD instruction sets where supported and scalar code. | ||
|
||
The reference implementation is unoptimized and is very slow, but it has | ||
simpler and shorter source code. Its purpose is to provide a simple | ||
human- and machine-readable specification that implementations intended | ||
for actual use should be tested against. It is deliberately mostly not | ||
optimized, and it is not meant to be used in production. | ||
|
||
Similarly to "make check", there's "make check-ref" to build and test | ||
the reference implementation. There's also "make ref" to build the | ||
reference implementation and have the "benchmark" program use it. | ||
|
||
"make clean" may need to be run between making different builds. | ||
|
||
|
||
How to integrate yespower in a program. | ||
|
||
Although yespower.h provides several functions, chances are that you | ||
will only need to use yespower_tls(). Please see the comment on this | ||
function in yespower.h and its example usage in tests.c and benchmark.c, | ||
including parameter sets requesting yescrypt 0.5 as used by certain | ||
existing cryptocurrencies. | ||
|
||
To integrate yespower in an altcoin based on Bitcoin Core, you might | ||
invoke yespower_tls() from either a maybe-new (depending on where you | ||
fork from) CBlockHeader::GetPoWHash() (and invoke that where PoW is | ||
needed like e.g. Litecoin does for scrypt) or CBlockHeader::GetHash() | ||
(easier, but inefficient and you'll be stuck with that inefficiency). | ||
|
||
You'll also want to implement caching of the computed PoW hashes like | ||
e.g. YACoin does for scrypt. Caching is especially important if you | ||
invoke yespower from CBlockHeader::GetHash(). However, even if you use | ||
or introduce CBlockHeader::GetPoWHash() caching may still be desirable | ||
as the PoW hash is commonly requested 4 times per block fetched during a | ||
node's initial blockchain sync (once during prefetch of block headers, | ||
and 3 times more during validation of a fully fetched block). On the | ||
other hand, you'll likely want to bypass the cache when PoW is computed | ||
by the node's built-in miner. | ||
|
||
Further detail on this (generating new genesis blocks, etc.) is even | ||
farther from being yespower-specific and thus is not provided here. | ||
Just like (and even more so than) yespower itself, the above guidance is | ||
provided as-is and without guarantee of being correct and safe to | ||
follow. You're supposed to know what you're doing. | ||
|
||
|
||
Credits. | ||
|
||
scrypt has been designed by Colin Percival. yescrypt and yespower have | ||
been designed by Solar Designer building upon scrypt. | ||
|
||
The following other people and projects have also indirectly helped make | ||
yespower what it is: | ||
|
||
- Bill Cox | ||
- Rich Felker | ||
- Anthony Ferrara | ||
- Christian Forler | ||
- Taylor Hornby | ||
- Dmitry Khovratovich | ||
- Samuel Neves | ||
- Marcos Simplicio | ||
- Ken T Takusagawa | ||
- Jakob Wenzel | ||
- Christian Winnerlein | ||
|
||
- DARPA Cyber Fast Track | ||
- Password Hashing Competition | ||
|
||
|
||
Contact info. | ||
|
||
First, please check the yespower homepage for new versions, etc.: | ||
|
||
https://www.openwall.com/yespower/ | ||
|
||
If you have anything valuable to add or a non-trivial question to ask, | ||
you may contact the maintainer of yespower at: | ||
|
||
Solar Designer <solar at openwall.com> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
yespower(5, 2048, 8, "Client Key") = a5 9f ec 4c 4f dd a1 6e 3b 14 05 ad da 66 d5 25 b6 8e 7c ad fc fe 6a c0 66 c7 ad 11 8c d8 05 90 | ||
yespower(5, 2048, 8, BSTY) = 5e a2 b2 95 6a 9e ac e3 0a 32 37 ff 1d 44 1e de e1 dc 25 aa b8 f0 ea 15 c1 21 65 f8 3a 7b c2 65 | ||
yespower(5, 4096, 16, "Client Key") = 92 7e 72 d0 de d3 d8 04 75 47 3f 40 f1 74 3c 67 28 9d 45 3d 52 42 d4 f5 5a f4 e3 25 e0 66 99 c5 | ||
yespower(5, 4096, 24, "Jagaricoin") = 0e 13 66 97 32 11 e7 fe a8 ad 9d 81 98 9c 84 a2 54 d9 68 c9 d3 33 dd 8f f0 99 32 4f 38 61 1e 04 | ||
yespower(5, 4096, 32, "WaviBanana") = 3a e0 5a bb 3c 5c f6 f7 54 15 a9 25 54 c9 8d 50 e3 8e c9 55 2c fa 78 37 36 16 f4 80 b2 4e 55 9f | ||
yespower(5, 2048, 32, "Client Key") = 56 0a 89 1b 5c a2 e1 c6 36 11 1a 9f f7 c8 94 a5 d0 a2 60 2f 43 fd cf a5 94 9b 95 e2 2f e4 46 1e | ||
yespower(5, 1024, 32, "Client Key") = 2a 79 e5 3d 1b e6 66 9b c5 56 cc c4 17 bc e3 d2 2a 74 a2 32 f5 6b 8e 1d 39 b4 57 92 67 5d e1 08 | ||
yespower(5, 2048, 8, NULL) = 5e cb d8 e8 d7 c9 0b ae d4 bb f8 91 6a 12 25 dc c3 c6 5f 5c 91 65 ba e8 1c dd e3 cf fa d1 28 e8 | ||
yespower(10, 2048, 8, NULL) = 69 e0 e8 95 b3 df 7a ee b8 37 d7 1f e1 99 e9 d3 4f 7e c4 6e cb ca 7a 2c 43 08 e5 18 57 ae 9b 46 | ||
yespower(10, 4096, 16, NULL) = 33 fb 8f 06 38 24 a4 a0 20 f6 3d ca 53 5f 5c a6 6a b5 57 64 68 c7 5d 1c ca ac 75 42 f7 64 95 ac | ||
yespower(10, 4096, 32, NULL) = 77 1a ee fd a8 fe 79 a0 82 5b c7 f2 ae e1 62 ab 55 78 57 46 39 ff c6 ca 37 23 cc 18 e5 e3 e2 85 | ||
yespower(10, 2048, 32, NULL) = d5 ef b8 13 cd 26 3e 9b 34 54 01 30 23 3c bb c6 a9 21 fb ff 34 31 e5 ec 1a 1a bd e2 ae a6 ff 4d | ||
yespower(10, 1024, 32, NULL) = 50 1b 79 2d b4 2e 38 8f 6e 7d 45 3c 95 d0 3a 12 a3 60 16 a5 15 4a 68 83 90 dd c6 09 a4 0c 67 99 | ||
yespower(10, 1024, 32, "personality test") = 1f 02 69 ac f5 65 c4 9a dc 0e f9 b8 f2 6a b3 80 8c dc 38 39 4a 25 4f dd ee dc c3 aa cf f6 ad 9d | ||
XOR of yespower(5, ...) = ae f1 32 91 87 0f 55 70 47 f4 2e 9b ef a6 16 df e5 f1 96 77 e1 3f 8b a6 92 f7 c5 97 55 a0 f5 0e | ||
XOR of yespower(10, ...) = 8d 13 c5 fb 07 30 96 75 d1 b8 48 92 77 ba 4b e4 40 33 be df ae 7a 60 43 8a 9b e2 1f 3a 7b 12 37 |
Oops, something went wrong.