Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8339648: ZGC: Division by zero in rule_major_allocation_rate #20888

Closed
wants to merge 4 commits into from

Conversation

MBaesken
Copy link
Member

@MBaesken MBaesken commented Sep 6, 2024

The HS jtreg test gc/stringdedup/TestStringDeduplicationAgeThreshold_ZGenerational
shows this error when running with ubsan enabled

src/hotspot/share/gc/z/zDirector.cpp:491:74: runtime error: division by zero
#0 0x7f09886401d4 in rule_major_allocation_rate src/hotspot/share/gc/z/zDirector.cpp:491
#1 0x7f09886401d4 in start_gc src/hotspot/share/gc/z/zDirector.cpp:822
#2 0x7f09886401d4 in ZDirector::run_thread() src/hotspot/share/gc/z/zDirector.cpp:912
#3 0x7f098c1404e8 in ZThread::run_service() src/hotspot/share/gc/z/zThread.cpp:29
#4 0x7f09897cac19 in ConcurrentGCThread::run() src/hotspot/share/gc/shared/concurrentGCThread.cpp:48
#5 0x7f098bb46b0a in Thread::call_run() src/hotspot/share/runtime/thread.cpp:225
#6 0x7f098b1a9881 in thread_native_entry src/hotspot/os/linux/os_linux.cpp:858


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8339648: ZGC: Division by zero in rule_major_allocation_rate (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/20888/head:pull/20888
$ git checkout pull/20888

Update a local copy of the PR:
$ git checkout pull/20888
$ git pull https://git.openjdk.org/jdk.git pull/20888/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 20888

View PR using the GUI difftool:
$ git pr show -t 20888

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/20888.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 6, 2024

👋 Welcome back mbaesken! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Sep 6, 2024

@MBaesken This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8339648: ZGC: Division by zero in rule_major_allocation_rate

Reviewed-by: aboldtch, lucy, tschatzl

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 120 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot changed the title JDK-8339648: ZGC: Division by zero in rule_major_allocation_rate 8339648: ZGC: Division by zero in rule_major_allocation_rate Sep 6, 2024
@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 6, 2024
@openjdk
Copy link

openjdk bot commented Sep 6, 2024

@MBaesken The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Sep 6, 2024

Webrevs

@MBaesken
Copy link
Member Author

MBaesken commented Sep 6, 2024

So we should avoid the division in case the divisor is zero and rewrite the coding a bit.

github-actions bot pushed a commit to dougxc/openjdk-pr-canary that referenced this pull request Sep 6, 2024
8339648: ZGC: Division by zero in rule_major_allocation_rate
Comment on lines 520 to 524
bool old_garbage_is_cheaper = true;
if (reclaimed_per_old_gc != 0) {
const double current_old_gc_time_per_bytes_freed = double(old_gc_time) / double(reclaimed_per_old_gc);
old_garbage_is_cheaper = current_old_gc_time_per_bytes_freed < current_young_gc_time_per_bytes_freed;
}
Copy link
Member

@xmas92 xmas92 Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ending up with old_garbage_is_cheaper == true when reclaimed_per_old_gc == 0 seems wrong to me.

Division by 0.0 is weird in C++. Do we even build for systems where it would not be supported. But regardless to me I feel like the change here should be more like:

-  const double current_old_gc_time_per_bytes_freed = double(old_gc_time) / double(reclaimed_per_old_gc);
+  const double current_old_gc_time_per_bytes_freed = reclaimed_per_old_gc == 0 ? std::numeric_limits<double>::infinity : double(old_gc_time) / double(reclaimed_per_old_gc);

Which is the behaviour I expect us to currently have, given that old_gc_time should be a positive number (>0.0). The !stats._old_stats._cycle._is_time_trustable check above should protect against 0.0.

I expect that this division we see happens when we have run a warmup major collection which did no reclaim any memory. And this change would trigger us to try and promote a minor collection to a major collection.

I am no expert on our supported platforms matrix w.r.t. floating numbers and std::numeric_limits<T>::has_infinity.

@MBaesken
Copy link
Member Author

MBaesken commented Sep 9, 2024

looks like for clang
https://bugs.llvm.org/show_bug.cgi?id=17000#c1
the float division by 0 became defined behavior, but it might be different for other compilers. I think it depends not only on the platform but also on the compiler. See the discussion here https://stackoverflow.com/questions/42926763/the-behaviour-of-floating-point-division-by-zero

@MBaesken
Copy link
Member Author

MBaesken commented Sep 9, 2024

Hi Axel, I adjusted the coding following your suggestion.
Btw. is there maybe already somewhere a template function doing that division handling divisor 0? Probably it is not the only place in the codebase where this can happen ?

Copy link
Member

@xmas92 xmas92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with this change. But I am not 100% about the use of std::numeric_limits<double>::infinity(). Maybe someone else can chime in.

Not sure if there are any other places we have expect division by zero to result in infinity.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 11, 2024
@MBaesken
Copy link
Member Author

I am fine with this change. But I am not 100% about the use of std::numeric_limits<double>::infinity(). Maybe someone else can chime in.

Not sure if there are any other places we have expect division by zero to result in infinity.

Thanks for the review !

Seems this exists since c++11 https://en.cppreference.com/w/cpp/types/numeric_limits/infinity so usage should be okay.
We also find it in libsimdsort (linux only in OpenJDK however) https://github.com/openjdk/jdk/blob/master/src/java.base/linux/native/libsimdsort/xss-common-includes.h#L47

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm but see the additional comment.

@@ -488,7 +488,7 @@ static bool rule_major_allocation_rate(const ZDirectorStats& stats) {

// Calculate the GC cost for each reclaimed byte
const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this division have the same issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could if no memory has been reclaimed at all (since the VM started). Similar issues would occur in the call to calculate_extra_young_gc_time below. And there I think the problem is even worse, because we might end up with inf - inf == -nan.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case where we have performed a major collection and no young collection has reclaim any memory seems like a very degenerate situation. The solution is probably to handle that case separately, and not try to adapt the current heuristics to handle the extreme values.

@@ -488,7 +488,7 @@ static bool rule_major_allocation_rate(const ZDirectorStats& stats) {

// Calculate the GC cost for each reclaimed byte
const double current_young_gc_time_per_bytes_freed = double(young_gc_time) / double(reclaimed_per_young_gc);
const double current_old_gc_time_per_bytes_freed = double(old_gc_time) / double(reclaimed_per_old_gc);
const double current_old_gc_time_per_bytes_freed = reclaimed_per_old_gc == 0 ? std::numeric_limits<double>::infinity() : double(old_gc_time) / double(reclaimed_per_old_gc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using some parentheses? To my understanding, the division has a higher precedence than the ternary conditional expression. See: https://en.cppreference.com/w/cpp/language/operator_precedence

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not mind parentheses. But ternary are the lowest precedence (if you do not count the , which I would almost always say is wrong to use without a surrounding () / [] / {}), so to me it seems superfluous.

Just to clarify the intent of this code is what we are getting with a higher precedence on division. That is:

  const double current_old_gc_time_per_bytes_freed = ((reclaimed_per_old_gc == 0) ? (std::numeric_limits<double>::infinity()) : (double(old_gc_time) / double(reclaimed_per_old_gc)));

Side Note:
I also think I prefer immediately invoked lambdas when the ternaries get this long.

  const double current_old_gc_time_per_bytes_freed = [&]() {
    if (reclaimed_per_old_gc == 0) {
      return std::numeric_limits<double>::infinity();
    }
    return double(old_gc_time) / double(reclaimed_per_old_gc);
  }();

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 16, 2024
Comment on lines 491 to 492
const double current_old_gc_time_per_bytes_freed = ((reclaimed_per_old_gc == 0) ? (std::numeric_limits<double>::infinity())
: (double(old_gc_time) / double(reclaimed_per_old_gc)));

This comment was marked as resolved.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 17, 2024
Copy link
Contributor

@RealLucy RealLucy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now. Thanks

@MBaesken
Copy link
Member Author

Thanks for the reviews !

/integrate

@openjdk
Copy link

openjdk bot commented Sep 17, 2024

Going to push as commit 80db6e7.
Since your change was applied there have been 122 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Sep 17, 2024
@openjdk openjdk bot closed this Sep 17, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 17, 2024
@openjdk
Copy link

openjdk bot commented Sep 17, 2024

@MBaesken Pushed as commit 80db6e7.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc [email protected] integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants