Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
iotune: Fix SIGFPE with some executions
Even after we fixed iotune to call update_current_best() after a timeout, some crashes are still happening. The reason becomes clear after we investigate a dump of the current state after the crash, and is as follows: After the end of either phase 1 or 2, the list of points to evaluate can become empty. Because at this point we haven't yet called update_current_best, we bail (as empty list is our signal to stop) and concurrency is 0 at this point. We could of course force update_current_best to be called at this point, but that is not the cleanest solution, for those lists should never be empty. Investigating the reason why the list was empty, I could see that the problem happened in phase 2, and was not that the disks in which this was happening was slow or fast, but erratic. The way we calculate our iterators bounds for phase 2 is to search for the point in which we reach (80 - 20) % throughput for a lower bound, and later on (80 + 10) % for an upper bound. In an erratic disk, the second point can happen before the first, in which case we will go until the end() of the iterator. Since we calculate the concurrency at the iterator, and end() means really post-the-end, that is an invalid point and the concurrency value at that point is undetermined. It can very well turn out to be a value smaller than the minimum, in which case the queue is empty. There are other cases as well, known by analysis in which the queue can be empty. They are cases in which we would go until the end anyway. For instance, the maximum achieved throughput in phase1 could be reached in the last point of the curve, in which point the queue would be empty (not observed). Both problems are fixed by calling a special helper that will calculate the boundaries and handle the end-of-list case speciallythe maximum achieved throughput in phase1 could be reached in the last point of the curve, in which point the queue would be empty (not observed). Both problems are fixed by calling a special helper that will calculate the boundaries and handle the end-of-list case specially. The loop in phase2 is broken into two but that is only accessory to the problem. I find that this improves readability about what we are trying to achieve. Signed-off-by: Glauber Costa <[email protected]> Message-Id: <[email protected]>
- Loading branch information