Skip to content

Commit

Permalink
iotune: Fix SIGFPE with some executions
Browse files Browse the repository at this point in the history
Even after we fixed iotune to call update_current_best() after a timeout, some
crashes are still happening. The reason becomes clear after we investigate a
dump of the current state after the crash, and is as follows:

After the end of either phase 1 or 2, the list of points to evaluate can become
empty. Because at this point we haven't yet called update_current_best, we bail
(as empty list is our signal to stop) and concurrency is 0 at this point.

We could of course force update_current_best to be called at this point, but
that is not the cleanest solution, for those lists should never be empty.

Investigating the reason why the list was empty, I could see that the problem
happened in phase 2, and was not that the disks in which this was happening was
slow or fast, but erratic. The way we calculate our iterators bounds for phase 2
is to search for the point in which we reach (80 - 20) % throughput for a lower
bound, and later on (80 + 10) % for an upper bound. In an erratic disk, the
second point can happen before the first, in which case we will go until the
end() of the iterator.

Since we calculate the concurrency at the iterator, and end() means really
post-the-end, that is an invalid point and the concurrency value at that point
is undetermined. It can very well turn out to be a value smaller than the
minimum, in which case the queue is empty.

There are other cases as well, known by analysis in which the queue can be
empty. They are cases in which we would go until the end anyway. For instance,
the maximum achieved throughput in phase1 could be reached in the last point of
the curve, in which point the queue would be empty (not observed).

Both problems are fixed by calling a special helper that will calculate the
boundaries and handle the end-of-list case speciallythe maximum achieved
throughput in phase1 could be reached in the last point of the curve, in which
point the queue would be empty (not observed).

Both problems are fixed by calling a special helper that will calculate the
boundaries and handle the end-of-list case specially. The loop in phase2 is
broken into two but that is only accessory to the problem. I find that this
improves readability about what we are trying to achieve.

Signed-off-by: Glauber Costa <[email protected]>
Message-Id: <[email protected]>
  • Loading branch information
Glauber Costa authored and avikivity committed Jul 20, 2016
1 parent 980b367 commit 5d42b55
Showing 1 changed file with 23 additions and 9 deletions.
32 changes: 23 additions & 9 deletions apps/iotune/iotune.cc
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,21 @@ class iotune_manager {
// concurrency.
std::map<uint64_t, uint64_t> _all_results = { {0ul, 0ul} };

void refill_concurrency_queue(auto begin, auto end) {
unsigned first = (*begin).first;
unsigned last;

if (end == _all_results.end()) {
last = (*_all_results.rbegin()).first;
} else {
last = (*end).first;
}

for (auto i = first; i < last; ++i) {
_concurrency_queue.push(i);
}
}

void find_max_region(const run_stats& result) {
if (result.IOPS > _best_result.IOPS) {
_best_result = result;
Expand All @@ -206,12 +221,8 @@ class iotune_manager {
std::cout << "Refining search for maximum. So far, " << _best_result.IOPS << " IOPS" << std::endl;
_phase_timing = 500ms;
auto it = _all_results.find(_best_result.concurrency);
auto prev = std::prev(it);
auto next = std::next(it);

for (auto i = (*prev).first; i < (*next).first; ++i) {
_concurrency_queue.push(i);
}
refill_concurrency_queue(std::prev(it), std::next(it));
_current_test_phase = test_phase::find_max_point;
}
}
Expand All @@ -230,15 +241,18 @@ class iotune_manager {
for (auto it = _all_results.begin(); it != _all_results.end(); ++it) {
if (((*it).second > (percentile - 0.20) * _best_result.IOPS) && (iterator_of_minimum == _all_results.begin())) {
iterator_of_minimum = it;
} else if ((*it).second > ((percentile + 0.10) * _best_result.IOPS)) {
iterator_of_maximum = it;
break;
}
}

for (auto i = (*iterator_of_minimum).first; i < (*iterator_of_maximum).first; ++i) {
_concurrency_queue.push(i);
for (auto it = std::next(iterator_of_minimum); it != _all_results.end(); ++it) {
if ((*it).second > ((percentile + 0.10) * _best_result.IOPS)) {
iterator_of_maximum = it;
break;
}
}

refill_concurrency_queue(iterator_of_minimum, iterator_of_maximum);
}
}

Expand Down

0 comments on commit 5d42b55

Please sign in to comment.