Skip to content

Commit

Permalink
[FLINK-35461][config] Deprecate options related to hash-based blockin…
Browse files Browse the repository at this point in the history
…g shuffle
  • Loading branch information
Sxnan authored and xintongsong committed May 30, 2024
1 parent 6ee3094 commit 501370d
Show file tree
Hide file tree
Showing 3 changed files with 9 additions and 26 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,6 @@
<td>Boolean</td>
<td>Boolean flag indicating whether the shuffle data will be compressed for batch shuffle mode. Note that data is compressed per buffer and compression can incur extra CPU overhead, so it is more effective for IO bounded scenario when compression ratio is high.</td>
</tr>
<tr>
<td><h5>taskmanager.network.blocking-shuffle.type</h5></td>
<td style="word-wrap: break-word;">"file"</td>
<td>String</td>
<td>The blocking shuffle type, either "mmap" or "file". The "auto" means selecting the property type automatically based on system memory architecture (64 bit for mmap and 32 bit for file). Note that the memory usage of mmap is not accounted by configured memory limits, but some resource frameworks like yarn would track this memory usage and kill the container once memory exceeding some threshold. Also note that this option is experimental and might be changed future.</td>
</tr>
<tr>
<td><h5>taskmanager.network.compression.codec</h5></td>
<td style="word-wrap: break-word;">LZ4</td>
Expand Down Expand Up @@ -218,12 +212,6 @@
<td>Integer</td>
<td>Minimum number of network buffers required per blocking result partition for sort-shuffle. For production usage, it is suggested to increase this config value to at least 2048 (64M memory if the default 32K memory segment size is used) to improve the data compression ratio and reduce the small network packets. Usually, several hundreds of megabytes memory is enough for large scale batch jobs. Note: you may also need to increase the size of total network memory to avoid the 'insufficient number of network buffers' error if you are increasing this config value.</td>
</tr>
<tr>
<td><h5>taskmanager.network.sort-shuffle.min-parallelism</h5></td>
<td style="word-wrap: break-word;">1</td>
<td>Integer</td>
<td>Parallelism threshold to switch between sort-based blocking shuffle and hash-based blocking shuffle, which means for batch jobs of smaller parallelism, hash-shuffle will be used and for batch jobs of larger or equal parallelism, sort-shuffle will be used. The value 1 means that sort-shuffle is the default option. Note: For production usage, you may also need to tune 'taskmanager.network.sort-shuffle.min-buffers' and 'taskmanager.memory.framework.off-heap.batch-shuffle.size' for better performance.</td>
</tr>
<tr>
<td><h5>taskmanager.network.tcp-connection.enable-reuse-across-jobs</h5></td>
<td style="word-wrap: break-word;">true</td>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,6 @@
<td>Boolean</td>
<td>Boolean flag indicating whether the shuffle data will be compressed for batch shuffle mode. Note that data is compressed per buffer and compression can incur extra CPU overhead, so it is more effective for IO bounded scenario when compression ratio is high.</td>
</tr>
<tr>
<td><h5>taskmanager.network.blocking-shuffle.type</h5></td>
<td style="word-wrap: break-word;">"file"</td>
<td>String</td>
<td>The blocking shuffle type, either "mmap" or "file". The "auto" means selecting the property type automatically based on system memory architecture (64 bit for mmap and 32 bit for file). Note that the memory usage of mmap is not accounted by configured memory limits, but some resource frameworks like yarn would track this memory usage and kill the container once memory exceeding some threshold. Also note that this option is experimental and might be changed future.</td>
</tr>
<tr>
<td><h5>taskmanager.network.compression.codec</h5></td>
<td style="word-wrap: break-word;">LZ4</td>
Expand Down Expand Up @@ -206,12 +200,6 @@
<td>Integer</td>
<td>Minimum number of network buffers required per blocking result partition for sort-shuffle. For production usage, it is suggested to increase this config value to at least 2048 (64M memory if the default 32K memory segment size is used) to improve the data compression ratio and reduce the small network packets. Usually, several hundreds of megabytes memory is enough for large scale batch jobs. Note: you may also need to increase the size of total network memory to avoid the 'insufficient number of network buffers' error if you are increasing this config value.</td>
</tr>
<tr>
<td><h5>taskmanager.network.sort-shuffle.min-parallelism</h5></td>
<td style="word-wrap: break-word;">1</td>
<td>Integer</td>
<td>Parallelism threshold to switch between sort-based blocking shuffle and hash-based blocking shuffle, which means for batch jobs of smaller parallelism, hash-shuffle will be used and for batch jobs of larger or equal parallelism, sort-shuffle will be used. The value 1 means that sort-shuffle is the default option. Note: For production usage, you may also need to tune 'taskmanager.network.sort-shuffle.min-buffers' and 'taskmanager.memory.framework.off-heap.batch-shuffle.size' for better performance.</td>
</tr>
<tr>
<td><h5>taskmanager.network.tcp-connection.enable-reuse-across-jobs</h5></td>
<td style="word-wrap: break-word;">true</td>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -316,8 +316,11 @@ public enum CompressionCodec {
/**
* Parallelism threshold to switch between sort-based blocking shuffle and hash-based blocking
* shuffle.
*
* @deprecated The hash-based blocking shuffle is deprecated in 1.20 and will be totally removed
* in 2.0.
*/
@Documentation.Section(Documentation.Sections.ALL_TASK_MANAGER_NETWORK)
@Deprecated
public static final ConfigOption<Integer> NETWORK_SORT_SHUFFLE_MIN_PARALLELISM =
key("taskmanager.network.sort-shuffle.min-parallelism")
.intType()
Expand Down Expand Up @@ -455,7 +458,11 @@ public enum CompressionCodec {
+ "the \"Insufficient number of network buffers\" exception, while the workloads may suffer performance "
+ "reduction silently.");

@Documentation.Section(Documentation.Sections.ALL_TASK_MANAGER_NETWORK)
/**
* @deprecated The hash-based blocking shuffle is deprecated in 1.20 and will be totally removed
* in 2.0.
*/
@Deprecated
public static final ConfigOption<String> NETWORK_BLOCKING_SHUFFLE_TYPE =
key("taskmanager.network.blocking-shuffle.type")
.stringType()
Expand Down

0 comments on commit 501370d

Please sign in to comment.