Skip to content

Commit

Permalink
p100nvcc: Workaround a compiler bug on P100
Browse files Browse the repository at this point in the history
ghstack-source-id: 7ba0bcc244b6bf10fe716371704f41a246a09ed8
Pull Request resolved: facebookresearch#434
  • Loading branch information
danthe3rd committed Sep 29, 2022
1 parent 7f9d25f commit 06f801d
Showing 1 changed file with 10 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,16 @@ class PredicatedTileAccessIteratorResidualLast<
the_predicates.get_mask(residual_tile_mask);
the_predicates.compute_predicates_(extent, true);

// Working around a weird compiler bug happening on P100 for the backward.
// I've seen together: the_predicates.predicates_[0] = 14 (instead of 15)
// residual_tile_mask[0] = 15 (correct)
//
// Adding prints when the value is calculated (in `compute_predicates_`)
// sometimes removes the bug. The consequence is that we skip the first
// element of a tensor, leading to wrong results. The line below should
// always be a no-op in theory
the_predicates.predicates_[0] |= residual_tile_mask[0];

// update internal pointers
Layout layout(params_.stride_);

Expand Down

0 comments on commit 06f801d

Please sign in to comment.