diff --git a/docs/img/SFNNv8_architecture.drawio b/docs/img/SFNNv8_architecture.drawio new file mode 100644 index 00000000..a3a8825d --- /dev/null +++ b/docs/img/SFNNv8_architecture.drawio @@ -0,0 +1,876 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/img/SFNNv8_architecture.svg b/docs/img/SFNNv8_architecture.svg new file mode 100644 index 00000000..161d95e1 --- /dev/null +++ b/docs/img/SFNNv8_architecture.svg @@ -0,0 +1,3 @@ + + +

HalfKAv2_hm
other side
perspective
HalfKAv2_hm...

HalfKAv2_hm
side to move
perspective
HalfKAv2_hm...
22528
22528
22528
22528
1032
1032
2560
2560
1032
1032
1
1
Linear
22528->1032
Linear...
Linear
22528->1032
Linear...
2560
2560
our
our
8
8
their
their
8
8
8
8
Average perspectives
(our - their) / 2
Average perspect...
LayerStackLayerStack
8
8
Choice based on
(popcount(pos.pieces()) - 1) / 4
Choice based on...
1
1
1
1
a    
a    
1280
1280
b    
b    
1280
1280
ClippedReLU
[0, 1]
ClippedReLU...
2560
2560
2560
2560
out
out
E-wise mult.
out = a * b
* (127/128)
E-wise mult....
1280
1280
2560
2560
a    
a    
1280
1280
b    
b    
1280
1280
ClippedReLU
[0, 1]
ClippedReLU...
out
out
E-wise mult.
out = a * b
* (127/128)
E-wise mult....
1280
1280
LayerStack
16
16
Linear
2560->16
Linear...
15
15
ClippedReLU
[0, 1]
ClippedReLU...
1
1
out
out
x
x
32
32
Linear
32->1
Linear...
Linear
30->32
Linear...
ClippedReLU
[0, 1]
ClippedReLU...
15
15
15
15
Reason for the
additional factor is
the same as in the case of
multiplication after
the feature
transformer.
Though here it is applied before
crelu. See leftmost
comment for
explanation.
Reason for the...
SqrClippedReLU
[0, 1]
out = crelu(
x*x*(127/128))
SqrClippedReLU...
The multiplication
by (127/128) is
performed because
127 is the unity after
quantization, but we
cannot efficiently
divide by 127,
so we divide by 128 instead. This
effectively restricts
the output range
to [0, 0.9921875]
instead of [0, 1].

The multiplication...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/docs/img/SFNNv8_architecture_detailed.drawio b/docs/img/SFNNv8_architecture_detailed.drawio new file mode 100644 index 00000000..7602de6e --- /dev/null +++ b/docs/img/SFNNv8_architecture_detailed.drawio @@ -0,0 +1,915 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/img/SFNNv8_architecture_detailed.svg b/docs/img/SFNNv8_architecture_detailed.svg new file mode 100644 index 00000000..88ae7fcc --- /dev/null +++ b/docs/img/SFNNv8_architecture_detailed.svg @@ -0,0 +1,3 @@ + + +
other side
perspective

other side...
side to move
perspective

side to move...
[2560] : FP<i8, 127>
[2560] : FP<i8, 127>
Single Perspective Subnet

Sparse Linear
out = in * weight + bias
--------------------------------
in[22528] : bool (implicit)
out[2560] : FP<i16, 127>
--------------------------------
weight[2560][22528] : FP<i16, 127>
bias[2560] : FP<i16, 127>

Sparse Linear...

ClippedReLU
out = clamp(in, 0, 1)
--------------------------------
in[2560] : FP<i16, 127>
out[2560] : FP<i16, 127>

ClippedReLU...
psq
psq

Sparse Linear
out = in * weight + bias
--------------------------------
in[22528] : bool (implicit)
out[8] : FP<i32, 600*16>
--------------------------------
weight[8][22528] : FP<i32, 600*16>
bias[8] : FP<i32, 600*16>

Sparse Linear...
[2560] : FP<i16, 127>
[2560] : FP<i16, 127>
a
a
[1280] : FP<i16, 127>
[1280] : FP<i16,...
[1280] : FP<i16, 127>
[1280] : FP<i16,...

Element-wise multiply
out = a * b * 127 / 128
----------------------
a[1280] : FP<i16, 127>
b[1280] : FP<i16, 127>
out[1280] : FP<i8, 127>

Element-wise multiply...
b
b

psq


psq...
Single Perspective Subnet
HalfAv2_hm feature set
[22528] : bool (implicit)
HalfAv2_hm feature set[22528] : bo...

psq


psq...
Single Perspective Subnet
HalfAv2_hm feature set
[22528] : bool (implicit)
HalfAv2_hm feature set[22528] : bo...
[1280] : FP<i8, 127>
[1280] : FP<i8, 127>
[1280] : FP<i8, 127>
[1280] : FP<i8, 127>

Stockfish SFNNv8 evaluation network diagram

Stockfish SFNNv8 evaluation network diagram

Notes:

FP<IntT, OneV> - a fixed-point type represented by an integer of type IntT and with value 1.0 being represented by OneV. The value of unity does not need to be an integer.

Value - Stockfish's internal evaluation units.

Multiplications by (127/128) are performed to match fast quantized implementation, which cannot use division by 127 due to performance restrictions.

Notes:...
their
their
[8] : FP<i32, 600*16>
[8] : FP<i32, 60...
our
our
[8] : FP<i32, 600*16>
[8] : FP<i32, 60...

Average Perspectives
out = (our - their) / 2
----------------------------
our[8] : FP<i32, 600*16>
their[8] : FP<i32, 600*16>
out[8] : FP<i32, 600*16>

Average Perspectives...
Main SubnetDense Linear<2560, 16>
[16] : FP<i32, 127*64>
[16] : FP<i32, 127*64>
[15] : FP<i32, 127*64>
[15] : FP<i32, 127*64>
[1] : FP<i32, 127*64>
[1] : FP<i32, 12...
Dense Linear<15, 32>SqrClippedReLU<15>ClippedReLU<32>Dot Product For Output<32>ClippedReLU<15>
[30] : FP<i32, 127>
[30] : FP<i32, 127>
[15] : FP<i32, 127>
[15] : FP<i32, 1...
[15] : FP<i32, 127>
[15] : FP<i32, 1...
0
0
1..6
1..6
7
7
Main SubnetMain Subnet
Choose element with index
(piece_count - 1) / 4
Choose element with index...
[1] : FP<Value, 16>
[1] : FP<Value, 16>

Convert to Value
out = in * 600
----------------------------
in[1] : FP<i32, 600*16>
out[1] : FP<Value, 16>

Convert to Value...
Dense Linear<Ins, Outs>
Dense Linear
out = in * weight + bias
--------------------------------
in[Ins] : FP<i8, 127>
out[Outs] : FP<i32, 127*64>
--------------------------------
weight[Outs][Ins]: FP<i8, 64>
bias[Outs] : FP<i32, 127*64>
Dense Linear...
ClippedReLU<Size>
ClippedReLU
out = clamp(in, 0, 1)
------------------------
in[Size] : FP<i32, 127*64>
out[Size] : FP<i8, 127>
ClippedReLU...
Dot Product For Output<Size>

Dot Product
out = in * weight + bias
--------------------------------
in[Size] : FP<i8, 127>
out[1] : FP<i32, 600*16>
--------------------------------
weight[Size] : FP<i8, 600*16/127>
bias[1] : FP<i32, 600*16>


Dot Product...
SqrClippedReLU<Size>
SqrClippedReLU
temp = in*in*127/128
out = clamp(temp, 0, 1)
------------------------
in[Size] : FP<i32, 127*64>
out[Size] : FP<i8, 127>
SqrClippedReLU...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/docs/img/SFNNv8_architecture_detailed_v2.drawio b/docs/img/SFNNv8_architecture_detailed_v2.drawio new file mode 100644 index 00000000..9c5c793a --- /dev/null +++ b/docs/img/SFNNv8_architecture_detailed_v2.drawio @@ -0,0 +1,911 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/img/SFNNv8_architecture_detailed_v2.svg b/docs/img/SFNNv8_architecture_detailed_v2.svg new file mode 100644 index 00000000..10f1cc73 --- /dev/null +++ b/docs/img/SFNNv8_architecture_detailed_v2.svg @@ -0,0 +1,3 @@ + + +
[2560] : FP<i8, 127>
[2560] : FP<i8, 127>
Single Perspective Subnet

Sparse Linear
out = in * weight + bias
--------------------------------
in[22528] : bool (implicit)
out[2560] : FP<i16, 127>
--------------------------------
weight[2560][22528] : FP<i16, 127>
bias[2560] : FP<i16, 127>

Sparse Linear...

ClippedReLU
out = clamp(in, 0, 1)
--------------------------------
in[2560] : FP<i16, 127>
out[2560] : FP<i16, 127>

ClippedReLU...
psq
psq

Sparse Linear
out = in * weight + bias
--------------------------------
in[22528] : bool (implicit)
out[8] : FP<i32, 600*16>
--------------------------------
weight[8][22528] : FP<i32, 600*16>
bias[8] : FP<i32, 600*16>

Sparse Linear...
[2560] : FP<i16, 127>
[2560] : FP<i16, 127>
a
a
[1280] : FP<i16, 127>
[1280] : FP<i16,...
[1280] : FP<i16, 127>
[1280] : FP<i16,...

Element-wise multiply
out = a * b * 127 / 128
----------------------
a[1280] : FP<i16, 127>
b[1280] : FP<i16, 127>
out[1280] : FP<i8, 127>

Element-wise multiply...
b
b

psq


psq...
Single Perspective Subnet
HalfKAv2_hm feature set
[22528] : bool (implicit)
HalfKAv2_hm feature set[22528] : b...

psq


psq...
Single Perspective Subnet
HalfKAv2_hm feature set
[22528] : bool (implicit)
HalfKAv2_hm feature set[22528] : b...
[1280] : FP<i8, 127>
[1280] : FP<i8, 127>
[1280] : FP<i8, 127>
[1280] : FP<i8, 127>

Stockfish SFNNv8 evaluation network diagram

Stockfish SFNNv8 evaluation network diagram

Notes:

FP<IntT, OneV> - a fixed-point type represented by an integer of type IntT and with value 1.0 being represented by OneV. The value of unity does not need to be an integer.

Value - Stockfish's internal evaluation units.

Multiplications by (127/128) are performed to match fast quantized implementation, which cannot use division by 127 due to performance restrictions.

Notes:...
their
their
[8] : FP<i32, 600*16>
[8] : FP<i32, 60...
our
our
[8] : FP<i32, 600*16>
[8] : FP<i32, 60...

Average Perspectives
out = (our - their) / 2
----------------------------
our[8] : FP<i32, 600*16>
their[8] : FP<i32, 600*16>
out[8] : FP<i32, 600*16>

Average Perspectives...
Main SubnetDense Linear<2560, 16>
[16] : FP<i32, 127*64>
[16] : FP<i32, 127*64>
[15] : FP<i32, 127*64>
[15] : FP<i32, 127*64>
[1] : FP<i32, 127*64>
[1] : FP<i32, 12...
Dense Linear<15, 32>SqrClippedReLU<15>ClippedReLU<32>Dot Product For Output<32>ClippedReLU<15>
[30] : FP<i32, 127>
[30] : FP<i32, 127>
[15] : FP<i32, 127>
[15] : FP<i32, 1...
[15] : FP<i32, 127>
[15] : FP<i32, 1...
0
0
1..6
1..6
7
7
Main SubnetMain Subnet
Choose element with index
(piece_count - 1) / 4
Choose element with index...
[1] : FP<Value, 16>
[1] : FP<Value, 16>

Convert to Value
out = in * 600
----------------------------
in[1] : FP<i32, 600*16>
out[1] : FP<Value, 16>

Convert to Value...
Dense Linear<Ins, Outs>
Dense Linear
out = in * weight + bias
--------------------------------
in[Ins] : FP<i8, 127>
out[Outs] : FP<i32, 127*64>
--------------------------------
weight[Outs][Ins]: FP<i8, 64>
bias[Outs] : FP<i32, 127*64>
Dense Linear...
ClippedReLU<Size>
ClippedReLU
out = clamp(in, 0, 1)
------------------------
in[Size] : FP<i32, 127*64>
out[Size] : FP<i8, 127>
ClippedReLU...
Dot Product For Output<Size>

Dot Product
out = in * weight + bias
--------------------------------
in[Size] : FP<i8, 127>
out[1] : FP<i32, 600*16>
--------------------------------
weight[Size] : FP<i8, 600*16/127>
bias[1] : FP<i32, 600*16>


Dot Product...
SqrClippedReLU<Size>
SqrClippedReLU
temp = in*in*127/128
out = clamp(temp, 0, 1)
------------------------
in[Size] : FP<i32, 127*64>
out[Size] : FP<i8, 127>
SqrClippedReLU...
side to move perspective
side to move perspective
other side perspective
other side perspective
HalfKAv2_hm input features:

For each piece on the board, with locations adjusted for given perspective (flip), activate feature indexed by the following tuple:

1. KSqH[32] - our king's square mapped (mirrored) to half of the board such that A1 == H1
2. PcSq[64] - square the piece is on
3. PcK[11] - essentially piece.type * 2 + piece.is_our; map kings to single index
HalfKAv2_hm input features:...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/docs/nnue.md b/docs/nnue.md index e963c3ad..017aa09f 100644 --- a/docs/nnue.md +++ b/docs/nnue.md @@ -127,6 +127,7 @@ What this document DOES NOT contain: + [A part of the feature transformer directly forwarded to the output.](#a-part-of-the-feature-transformer-directly-forwarded-to-the-output) + [Multiple PSQT outputs and multiple subnetworks](#multiple-psqt-outputs-and-multiple-subnetworks) * [Historical Stockfish evaluation network architectures](#historical-stockfish-evaluation-network-architectures) + + ["SFNNv8" architecture](#sfnnv8-architecture) + ["SFNNv7" architecture](#sfnnv7-architecture) + ["SFNNv6" architecture](#sfnnv6-architecture) + ["SFNNv5" architecture](#sfnnv5-architecture) @@ -2957,11 +2958,21 @@ y = self.layer_stacks(l0_, layer_stack_indices) + (wpsqt - bpsqt) * (us - 0.5) ## Historical Stockfish evaluation network architectures +### "SFNNv8" architecture + +Same as "SFNNv5" with L1 size increased to 2560. + +2023-09-22 - * + +[Commit 782c32223583d7774770fc56e50bd88aae35cd1a](https://github.com/official-stockfish/Stockfish/commit/70ba9de85cddc5460b1ec53e0a99bee271e26ece) + +![](img/SFNNv8_architecture_detailed_v2.svg) + ### "SFNNv7" architecture -Same as "SFNNv6" with L1 size increased from 1536 to 2048. +Same as "SFNNv5" with L1 size increased to 2048. -2023-07-01 - * +2023-07-01 - 2023-09-22 [Commit 915532181f11812c80ef0b57bc018de4ea2155ec](https://github.com/official-stockfish/Stockfish/commit/915532181f11812c80ef0b57bc018de4ea2155ec)