iotune: adjust num-io-queues recommendation

Ideally we would like to move out of I/O Queues, and allow all shards to dispatch. While we are not there yet, we are still using I/O Queues as a temporary mechanism -- the evolutionary approach. We calculate an amount of I/O Queues based on the IOPS and bandwidth limits of the device, so that each I/O Queue will have a minimum amount of each of those quantities. However, I calculated those early on in the process, where the latency target we had in mind was one task quota. We ended up moving to 3 task quotas as a latency target and as a result there are, in some situations much fewer I/O Queues in the system that what we would like. I saw a regression when comparing Scylla 2.2 and Scylla 2.3 on an AWS EBS device: The new IOTune recommended 1 queue, whereas the old I/O Queue recommended 3. This is exactly the factor of 3 mentioned above. My proposed fix, for now, is to adjust the requirements by that factor of 3, so as to keep configuring systems with the same amount of I/O Queues as they had before. While doing that, we are also rewriting the requirements to be more clear as to where are the magic numbers we used are coming from. As a next step, we should prioritize local all-shards dispatch. Signed-off-by: Glauber Costa <[email protected]> Message-Id: <[email protected]>
spurnaye · Aug 19, 2018 · 858bef5 · 858bef5
1 parent f416e78
commit 858bef5
Showing 1 changed file with 12 additions and 4 deletions.
diff --git a/apps/iotune/iotune.cc b/apps/iotune/iotune.cc
@@ -578,6 +578,9 @@ fs::path mountpoint_of(sstring filename) {
     return mnt_candidate;
 }
 
+static constexpr unsigned task_quotas_in_default_latency_goal = 3;
+static constexpr float latency_goal = 0.0005;
+
 int main(int ac, char** av) {
     namespace bpo = boost::program_options;
     bool fs_check = false;
@@ -674,10 +677,15 @@ int main(int ac, char** av) {
 
             unsigned num_io_queues = smp::count;
             for (auto& desc : disk_descriptors) {
-                // Allow each I/O Queue to have at least 10k IOPS and 100MB. Values decided based
-                // on the write performance, which tends to be lower.
-                num_io_queues = std::min(smp::count, unsigned(desc.write_iops / 10000));
-                num_io_queues = std::min(smp::count, unsigned(desc.write_bw / (100 * 1024 * 1024)));
+                // Ideally we wouldn't have I/O Queues and would dispatch from every shard (https://github.com/scylladb/seastar/issues/485)
+                // While we don't do that, we'll just be conservative and try to recommend values of I/O Queues that are close to what we
+                // suggested before the I/O Scheduler rework. The I/O Scheduler has traditionally tried to make sure that each queue would have
+                // at least 4 requests in depth, and all its requests were 4kB in size. Therefore, try to arrange the I/O Queues so that we would
+                // end up in the same situation here (that's where the 4 comes from).
+                //
+                // For the bandwidth limit, we want that to be 4 * 4096, so each I/O Queue has the same bandwidth as before.
+                num_io_queues = std::min(smp::count, unsigned((task_quotas_in_default_latency_goal * desc.write_iops * latency_goal) / 4));
+                num_io_queues = std::min(num_io_queues, unsigned((task_quotas_in_default_latency_goal * desc.write_bw * latency_goal) / (4 * 4096)));
                 num_io_queues = std::max(num_io_queues, 1u);
             }
             fmt::print("Recommended --num-io-queues: {}\n", num_io_queues);