PG-1604: Improve last key LSN calculation logic #519

dutow · 2025-08-11T06:30:37Z

Previosly we simply set the LSN for the new key to the first write
location.

This is however not correct, as there are many corner cases around this:

recovery / replication might write old LSNs
we can't handle multiple keys with the same TLI/LSN, which can happen
with quick restarts without writes

To support this in this commit we modify the following:

We only activate new keys outside crash recovery, or immediately if
encryption is turned off
We also take the already existing last key into account (if exists),
and only activate a new key if we progressed past its start location

The remaining changes are just support infrastructure for this:

Since we might rewrite old records, we use the already existing keys
for those writes, not the active last keys
We prefetch existing keys during initialization, so it doesn't
accidentally happen in the critical section during a write

There is a remaining bug with stopping wal encryption, also mentioned in
a TODO message in the code. This will be addressed in a later PR as this
fix already took too long.

--

And two additional bugfixes, see the separate commits.

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c

codecov-commenter · 2025-08-12T21:22:42Z

Codecov Report

❌ Patch coverage is 97.67442% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 82.51%. Comparing base (b2bb77c) to head (e3b548c).
⚠️ Report is 8 commits behind head on TDE_REL_17_STABLE.

❌ Your project status has failed because the head coverage (82.51%) is below the target coverage (90.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@                  Coverage Diff                  @@
##           TDE_REL_17_STABLE     #519      +/-   ##
=====================================================
+ Coverage              82.19%   82.51%   +0.31%     
=====================================================
  Files                     25       25              
  Lines                   3174     3209      +35     
  Branches                 515      510       -5     
=====================================================
+ Hits                    2609     2648      +39     
+ Misses                   456      452       -4     
  Partials                 109      109

Components	Coverage Δ
access	`84.95% <97.67%> (+1.47%)`	⬆️
catalog	`87.68% <ø> (+0.07%)`	⬆️
common	`77.77% <ø> (ø)`
encryption	`72.97% <ø> (ø)`
keyring	`73.21% <ø> (ø)`
src	`94.15% <ø> (ø)`
smgr	`96.53% <ø> (+1.23%)`	⬆️
transam	`∅ <ø> (∅)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c

jeltz

Review of the the bugifx comment.

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c

contrib/pg_tde/t/059_tde_2pc_replication.pl

contrib/pg_tde/t/.pgtde.pm.swp

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c

jeltz

I am still not a huge fan of rewinding and uisng the same key to write what we think should be new data but which technically does not have to be.

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c

src/bin/pg_basebackup/receivelog.c

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c

The min/max comparisons of LSNs assumed that everyting is in the same timeline. In practice, with replication + recovery combinations, it is possible that keys span at least 3 timelines, which means that this has to be included in both combinations, as in other timelines, the restrictions are less strict.

contrib/pg_tde/xlogreader.c

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c

jeltz · 2025-08-14T09:26:40Z

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c

+	last_key_loc.tli = TDEXLogGetEncKeyTli();
+
+	lastKeyUsable = (TDEXLogGetEncKeyLsn() != 0);
+	afterLastKey = wal_location_cmp(last_key_loc, loc) <= 0;


Is it even necessary to check for this? Don't we write the WAL in a linear fashion so once recovery is done can we really write anything to old locations?

yes, it is, because replicas might go back to earlier locations when rewriting an entire segment.

Even after recovery is done? Wow. I think this needs to be explained with a comment in the code.

Even after recovery is done?

Replicas are always in recovery, we can't rely on just checking recovery status. That was one of my earlier solutions, but replicas basically never generated new keys because of that (only in some corner cases)

So the current solution restricts the check for crash recovery, and generates a new key even on replicas after we progressed past the last key.

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c

Previosly we simply set the LSN for the new key to the first write location. This is however not correct, as there are many corner cases around this: * recovery / replication might write old LSNs * we can't handle multiple keys with the same TLI/LSN, which can happen with quick restarts without writes To support this in this commit we modify the following: * We only activate new keys outside crash recovery, or immediately if encryption is turned off * We also take the already existing last key into account (if exists), and only activate a new key if we progressed past its start location The remaining changes are just support infrastructure for this: * Since we might rewrite old records, we use the already existing keys for those writes, not the active last keys * We prefetch existing keys during initialization, so it doesn't accidentally happen in the critical section during a write There is a remaining bug with stopping wal encryption, also mentioned in a TODO message in the code. This will be addressed in a later PR as this fix already took too long.

dAdAbird reviewed Aug 11, 2025

View reviewed changes

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c Outdated Show resolved Hide resolved

dAdAbird reviewed Aug 11, 2025

View reviewed changes

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c Outdated Show resolved Hide resolved

dutow force-pushed the pg1605 branch from d762fae to 4b934b9 Compare August 11, 2025 18:57

dAdAbird reviewed Aug 11, 2025

View reviewed changes

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c Outdated Show resolved Hide resolved

dutow force-pushed the pg1605 branch 2 times, most recently from fed0689 to a9b1adf Compare August 12, 2025 20:05

dutow changed the title ~~PG-1605 Supporting multiple WAL key changes in the last segment~~ PG-1604: Improve last key LSN calculation logic Aug 12, 2025

dutow marked this pull request as ready for review August 12, 2025 20:06

dutow force-pushed the pg1605 branch 2 times, most recently from 295c016 to 82b1ce9 Compare August 12, 2025 21:17

dutow force-pushed the pg1605 branch 5 times, most recently from 637f46e to 0049350 Compare August 13, 2025 10:09

AndersAstrand reviewed Aug 13, 2025

View reviewed changes

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c Outdated Show resolved Hide resolved

dutow force-pushed the pg1605 branch from 0049350 to b8bf4ba Compare August 13, 2025 11:37

jeltz requested changes Aug 13, 2025

View reviewed changes

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c Show resolved Hide resolved

dAdAbird reviewed Aug 13, 2025

View reviewed changes

contrib/pg_tde/t/059_tde_2pc_replication.pl Outdated Show resolved Hide resolved

contrib/pg_tde/t/.pgtde.pm.swp Outdated Show resolved Hide resolved

dutow force-pushed the pg1605 branch from b8bf4ba to 2cbd607 Compare August 13, 2025 13:44

dAdAbird reviewed Aug 13, 2025

View reviewed changes

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c Outdated Show resolved Hide resolved

jeltz requested changes Aug 13, 2025

View reviewed changes

dutow force-pushed the pg1605 branch from 2cbd607 to 406f1ef Compare August 13, 2025 17:09

dutow requested a review from jeltz August 13, 2025 17:10

jeltz reviewed Aug 14, 2025

View reviewed changes

contrib/pg_tde/xlogreader.c Outdated Show resolved Hide resolved

jeltz requested changes Aug 14, 2025

View reviewed changes

dutow force-pushed the pg1605 branch from 406f1ef to 12a0e3d Compare August 14, 2025 17:18

jeltz reviewed Aug 14, 2025

View reviewed changes

contrib/pg_tde/src/access/pg_tde_xlog_smgr.c Show resolved Hide resolved

jeltz approved these changes Aug 14, 2025

View reviewed changes

dutow force-pushed the pg1605 branch from 12a0e3d to fcc0b86 Compare August 14, 2025 17:40

dutow force-pushed the pg1605 branch from fcc0b86 to e3b548c Compare August 14, 2025 17:48

dutow merged commit 9dfed22 into percona:TDE_REL_17_STABLE Aug 14, 2025
18 of 19 checks passed

PG-1604: Improve last key LSN calculation logic #519

PG-1604: Improve last key LSN calculation logic #519

Uh oh!

Conversation

dutow commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

jeltz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeltz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeltz Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

dutow Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

jeltz Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

dutow Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dutow commented Aug 11, 2025 •

edited

Loading

codecov-commenter commented Aug 12, 2025 •

edited

Loading

jeltz left a comment •

edited

Loading