Skip to content

Conversation

jeltz
Copy link
Collaborator

@jeltz jeltz commented Aug 19, 2025

Unfortunately the logic for generating a new key to protect the stream cipher used to encrypt the WAL stream in our restore command was based on totally incorrect assumptions due to how the rerecord is implemented.

Recovery is a state machine which can go back and forward between one mode where it streams from a primary and another where it first tries to fetch WAL from the archive and if that fails from the pg_wal directory, and in the pg_wal directory we may have files which are encrypted with whatever keys were there originally.

To handle all the possible scenarios we remove the ability of pg_tde_restore_encrypt to generate new keys and just has it use whatever keys there are in the key file. This unfortunately means we open ourselves to some attacks on the stream cipher if the system is tricked into encrypting a different WAL stream at the same TLI and LSN as we already have encrypted. As far as I know this should be rare under normal operations since normally e.g. the WAL should be the same in the archive as the one in pg_wal or which we receive through streaming.

Ideally we would want to fix this but for now it is better to have WAL encryption with this weakness than to not have it at all.

This also incidentally fixes a bug we discovered caused by generating a new key only invalidating one key rather than all keys which should have become invalid, since we no longer generate a new key.

NOTE: We will likely need to update the documentation of pg_tde_restore_encrypt to reflect this.

@jeltz jeltz force-pushed the tde/restore-command-old-keys branch 3 times, most recently from 5670b9b to 39e77fd Compare August 19, 2025 09:27
@jeltz jeltz changed the title Make pg_tde_restore_encrypt re-use old keys PG-1867 Make pg_tde_restore_encrypt re-use old keys Aug 19, 2025
Unfortunately the logic for generating a new key to protect the stream
cipher used to encrypt the WAL stream in our restore command was based
on totally incorrect assumptions due to how the recovery is implemented.

Recovery is a state machine which can go back and forward between one
mode where it streams from a primary and another where it first tries to
fetch WAL from the archive and if that fails from the pg_wal directory,
and in the pg_wal directory we may have files which are encrypted with
whatever keys were there originally.

To handle all the possible scenarios we remove the ability of
pg_tde_restore_encrypt to generate new keys and just has it use whatever
keys there are in the key file. This unfortunately means we open
ourselves to some attacks on the stream cipher if the system is tricked
into encrypting a different WAL stream at the same TLI and LSN as we
already have encrypted. As far as I know this should be rare under
normal operations since normally e.g. the WAL should be the same in the
archive as the one in pg_wal or which we receive through streaming.

Ideally we would want to fix this but for now it is better to have WAL
encryption with this weakness than to not have it at all.

This also incidentally fixes a bug we discovered caused by generating a
new key only invalidating one key rather than all keys which should have
become invalid, since we no longer generate a new key.
@jeltz jeltz force-pushed the tde/restore-command-old-keys branch from 39e77fd to ad2eaa1 Compare August 19, 2025 09:36
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 11.11111% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.41%. Comparing base (aed49c0) to head (ad2eaa1).

❌ Your project status has failed because the head coverage (82.41%) is below the target coverage (90.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@                  Coverage Diff                  @@
##           TDE_REL_17_STABLE     #543      +/-   ##
=====================================================
- Coverage              82.46%   82.41%   -0.06%     
=====================================================
  Files                     25       25              
  Lines                   3228     3230       +2     
  Branches                 511      511              
=====================================================
  Hits                    2662     2662              
- Misses                   455      457       +2     
  Partials                 111      111              
Components Coverage Δ
access 84.40% <0.00%> (-0.23%) ⬇️
catalog 87.65% <ø> (ø)
common 77.77% <ø> (ø)
encryption 72.97% <ø> (ø)
keyring 73.21% <ø> (ø)
src 94.18% <ø> (ø)
smgr 96.53% <ø> (ø)
transam ∅ <ø> (∅)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jeltz jeltz merged commit 80e0bb0 into percona:TDE_REL_17_STABLE Aug 19, 2025
27 of 33 checks passed
Copy link
Member

@dAdAbird dAdAbird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a small suggestion

WALKeyCacheRec *keys;
WalEncryptionKey dummy = {
.type = WAL_KEY_TYPE_UNENCRYPTED,
.wal_start = {.tli = -1,.lsn = -1}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather use something like (~(XLogRecPtr)0) and (~(TimeLineID)0) (or better named constants like here) just for the reader's clarity sake. It's not obvious on the first glance that .tli and .lsn are unsigned types.

@jeltz jeltz deleted the tde/restore-command-old-keys branch August 19, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants