Skip to content

Conversation

dutow
Copy link
Collaborator

@dutow dutow commented Aug 11, 2025

Previosly we simply set the LSN for the new key to the first write
location.

This is however not correct, as there are many corner cases around this:

  • recovery / replication might write old LSNs
  • we can't handle multiple keys with the same TLI/LSN, which can happen
    with quick restarts without writes

To support this in this commit we modify the following:

  • We only activate new keys outside crash recovery, or immediately if
    encryption is turned off
  • We also take the already existing last key into account (if exists),
    and only activate a new key if we progressed past its start location

The remaining changes are just support infrastructure for this:

  • Since we might rewrite old records, we use the already existing keys
    for those writes, not the active last keys
  • We prefetch existing keys during initialization, so it doesn't
    accidentally happen in the critical section during a write

There is a remaining bug with stopping wal encryption, also mentioned in
a TODO message in the code. This will be addressed in a later PR as this
fix already took too long.

--

And two additional bugfixes, see the separate commits.

@dutow dutow force-pushed the pg1605 branch 2 times, most recently from fed0689 to a9b1adf Compare August 12, 2025 20:05
@dutow dutow changed the title PG-1605 Supporting multiple WAL key changes in the last segment PG-1604: Improve last key LSN calculation logic Aug 12, 2025
@dutow dutow marked this pull request as ready for review August 12, 2025 20:06
@dutow dutow force-pushed the pg1605 branch 2 times, most recently from 295c016 to 82b1ce9 Compare August 12, 2025 21:17
@codecov-commenter
Copy link

codecov-commenter commented Aug 12, 2025

Codecov Report

❌ Patch coverage is 97.67442% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 82.51%. Comparing base (b2bb77c) to head (e3b548c).
⚠️ Report is 8 commits behind head on TDE_REL_17_STABLE.

❌ Your project status has failed because the head coverage (82.51%) is below the target coverage (90.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@                  Coverage Diff                  @@
##           TDE_REL_17_STABLE     #519      +/-   ##
=====================================================
+ Coverage              82.19%   82.51%   +0.31%     
=====================================================
  Files                     25       25              
  Lines                   3174     3209      +35     
  Branches                 515      510       -5     
=====================================================
+ Hits                    2609     2648      +39     
+ Misses                   456      452       -4     
  Partials                 109      109              
Components Coverage Δ
access 84.95% <97.67%> (+1.47%) ⬆️
catalog 87.68% <ø> (+0.07%) ⬆️
common 77.77% <ø> (ø)
encryption 72.97% <ø> (ø)
keyring 73.21% <ø> (ø)
src 94.15% <ø> (ø)
smgr 96.53% <ø> (+1.23%) ⬆️
transam ∅ <ø> (∅)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dutow dutow force-pushed the pg1605 branch 5 times, most recently from 637f46e to 0049350 Compare August 13, 2025 10:09
Copy link
Collaborator

@jeltz jeltz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of the the bugifx comment.

Copy link
Collaborator

@jeltz jeltz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not a huge fan of rewinding and uisng the same key to write what we think should be new data but which technically does not have to be.

The min/max comparisons of LSNs assumed that everyting is in the same
timeline. In practice, with replication + recovery combinations, it is
possible that keys span at least 3 timelines, which means that this has
to be included in both combinations, as in other timelines, the
restrictions are less strict.
last_key_loc.tli = TDEXLogGetEncKeyTli();

lastKeyUsable = (TDEXLogGetEncKeyLsn() != 0);
afterLastKey = wal_location_cmp(last_key_loc, loc) <= 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it even necessary to check for this? Don't we write the WAL in a linear fashion so once recovery is done can we really write anything to old locations?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is, because replicas might go back to earlier locations when rewriting an entire segment.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after recovery is done? Wow. I think this needs to be explained with a comment in the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after recovery is done?

Replicas are always in recovery, we can't rely on just checking recovery status. That was one of my earlier solutions, but replicas basically never generated new keys because of that (only in some corner cases)

So the current solution restricts the check for crash recovery, and generates a new key even on replicas after we progressed past the last key.

Previosly we simply set the LSN for the new key to the first write
location.

This is however not correct, as there are many corner cases around this:
* recovery / replication might write old LSNs
* we can't handle multiple keys with the same TLI/LSN, which can happen
  with quick restarts without writes

To support this in this commit we modify the following:
* We only activate new keys outside crash recovery, or immediately if
  encryption is turned off
* We also take the already existing last key into account (if exists),
  and only activate a new key if we progressed past its start location

The remaining changes are just support infrastructure for this:

* Since we might rewrite old records, we use the already existing keys
  for those writes, not the active last keys
* We prefetch existing keys during initialization, so it doesn't
  accidentally happen in the critical section during a write

There is a remaining bug with stopping wal encryption, also mentioned in
a TODO message in the code. This will be addressed in a later PR as this
fix already took too long.
@dutow dutow merged commit 9dfed22 into percona:TDE_REL_17_STABLE Aug 14, 2025
18 of 19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants