Skip to content

pg_tde test PR #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 149 commits into
base: TDE_REL_17_STABLE
Choose a base branch
from
Open

pg_tde test PR #3

wants to merge 149 commits into from

Conversation

jeltz
Copy link
Owner

@jeltz jeltz commented Feb 24, 2025

This is a testy PR!

@jeltz jeltz changed the base branch from master to TDE_REL_17_STABLE February 24, 2025 18:14
jeltz pushed a commit that referenced this pull request Mar 5, 2025
1. TruncateMultiXact() performs the SLRU truncations in a critical
section. Deleting the SLRU segments calls ForwardSyncRequest(), which
will try to compact the request queue if it's full
(CompactCheckpointerRequestQueue()). That in turn allocates memory,
which is not allowed in a critical section. Backtrace:

    TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981
    postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e]
    postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d]
    postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e]
    postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb]
    postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a]
    postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1]
    postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b]
    postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3]
    postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66]
    postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d]
    postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead]
    postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e]
    postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb]
    postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e]
    /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a]
    /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45]
    postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31]

To fix, bail out in CompactCheckpointerRequestQueue() without doing
anything, if it's called in a critical section. That covers the above
call path, as well as any other similar cases where
RegisterSyncRequest might be called in a critical section.

2. After fixing that, another problem became apparent: Autovacuum
process doing that truncation can deadlock with the checkpointer
process. TruncateMultiXact() sets "MyProc->delayChkptFlags |=
DELAY_CHKPT_START". If the sync request queue is full and cannot be
compacted, the process will repeatedly sleep and retry, until there is
room in the queue. However, if the checkpointer is trying to start a
checkpoint at the same time, and is waiting for the DELAY_CHKPT_START
processes to finish, the queue will never shrink.

More concretely, the autovacuum process is stuck here:

    #0  0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570
    #2  WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1,
        wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516
    #3  0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949)
        at ../src/backend/storage/ipc/latch.c:538
    postgres#4  0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614
    postgres#5  0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495
    postgres#6  0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566
    postgres#7  0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006
    postgres#8  TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201
    postgres#9  0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917
    postgres#10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760
    postgres#11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550
    postgres#12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569

and the checkpointer is stuck here:

    #0  0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
    #1  0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6
    #2  0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50
    #3  0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098
    postgres#4  0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464

To fix, add AbsorbSyncRequests() to the loops where the checkpointer
waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to
finish.

Backpatch to v14. Before that, SLRU deletion didn't call
RegisterSyncRequest, which avoided this failure. I'm not sure if there
are other similar scenarios on older versions, but we haven't had
any such reports.

Discussion: https://www.postgresql.org/message-id/[email protected]
jeltz and others added 26 commits April 4, 2025 19:48
If we look up something about a relation in the command start event
trigger we need to hold onto that lock until the end of the command
and not release it so no other commd is able to change the data we
looked at between now and when the DDL command itself actually locks
the relation.
For now we do not use this field but we plan to use it when encrypting
Wrelation data and WAL.
We might as well increase the entropy by adding a random base value to
the IVs used used when encrypting relations. But since for now the pair
of key + IV is generated and used together it adds little extra security
over what we already have.

We add the IV with XOR since that is a cheap and easy operation which
uases no extra collissions.
Same as last commit but for the WAL encryption.

Rewrote the calculation in a way gcc's vectorizer likes, as verified
with Godbolt. The code generated by clang is ok and branch free but it
fails to properly vectorize both before and after.
This way they can eventually be made more similar to using OpenSSL
for encrypting and decrypting the CTR stream.
While this does not merge the two similar functions it removes a lot
of cruft from pg_tde_read_one_map_entry2() making the functions more
similar and the code easier to read.
…l key

We already had protection against decrypting relation keys with the wrong
principal key but to properly protect us against new relation keys being
encrypted with the wrong principal key we need to also verify that the
principal key was correct when we fetch the principal key from the key
provider. We do so by signing the principal key info header of the key map
file using AES-128-GCM.

This way we cannot get a jumbled mess of relation keys encrypted with
multiple different principal keys.
This follows the pattern in the rest of the PostgreSQL source code.
The end LSN of the current buffer to decript was exclusive while the end
LSN of the key was inclusive which had led to confusion and an
off-by-one bug.

Also add a simple test case for the WAL encryption using the logical
decoding's test plugin.
This makes the code easier to read plus gives us things like letting
the compiler check for unused variables.
Several variables were at a too long scope or were initialized when
they should not be. Plus convert a while loop to a for loop.
The comment about all-zero pages created by smgrzeroextend() sounded
much scarier than the reality. In fact trying to encrypt these
all-zero pages might not only be a waste of CPU cycles but also could
decrease security by making us re-use the same IV first with all zeros
and then with the actual data. And the extra amount of protection
we gain from encrypting them is minuscule since they are only added
at the end of the table, soon overwritten and only gives the attacker
a very slightly more accurate table size.
There is no point in looking up the length of the file when we already
know it.
* PG-1457 Rename some key management funcions

* PG-1457 Fix some tests

* PG-1457 Hit CI

* PG-1457 Rename key in CI setup

* PG-1457 Rename pg_tde_verify_global_principal_key to pg_tde_verify_server_principal_key

* PG-1457 Rename keys in tests

* PG-1457 Renaming

* PG-1457 Renaming

* PG-1457 Fix tests

* PG-1457 Fix tests

* PG-1457 Fix tabs

* PG-1457 Fix tests

* PG-1457 Fix tests

* PG-1457 Fix

* PG-1457 Fix test

* PG-1457 Fix test

* PG-1457 Hit CI

* PG-1457 Fix after rebase

* PG-1457 Fix

* PG-1457 Fix

* PG-1457 Fix

* PG-1457 Fix test

* PG-1457 Fix tests

* PG-1457 Fix tests

* PG-1457 Fix
It's not clear to someone new to the code that "key provider" in these
files refers to what's called "key provider type" elsewhere.

Rename these to make it easier for the next person.
By using SPI and a SQL language function to force the rewrite of
sequences we made life unnecessarily hard for ourselves and also
intoduced a bug where ALTER TABLE could break if the pg_tde extension's
functions were not in the search path.

Instead we re-use the code for updating sequence persistence whe table
persistance is changed. And we only need to look at the table's
presistance since it is always the same as the one of the sequences.
This file is no longer used by our scripts and is easy to generate when
running pgindent.
The R was pulling double duty in the old name.
This is just a conversion of the google doc into markdown, with
actualizing some of the outdated details in the document.

The last section (researc/investigation topics) is left out, as that
doesn't make much sense in a public documentation.
When we did a recovery, even if tde_heap was not used, 98% of the time
was spent in pg_tde_get_key_from_file() due to our SMGR missing the
shortcicruit from mdcreate() which skips running if the fork already
is open.

The issue was made worse by us not caching negative SMGR key lookups but
that is the subject of another commit.
…s#154)

PG-1457

Replace `principal key` with just a `key` on user API level, as it's the only key that user can directly interact with.
Commit e3a87b4 added a way to call
ereport() without any extra parentheses which was backported all the way
back to PostgreSQL 12. The new modern way improves code readability
slightly.
jeltz and others added 12 commits April 25, 2025 15:50
HeapTupleIsValid() is actually just a null check but PostgreSQL's
codebase almost always uses this macro and we had a confusion where we
both had a null check and called this macro so we at least should pick
just one of the two ways to write it. And here I picked the most
commonly used way in the PostgreSQL codebase.
We apparently had the same alternate file twice in our repo.
For some reason these functions were declared twice, once using the
macro and once without it.
To run pg_tde tests with `make check` we have to add pg_buffercache and
test_decoding extensions to temporary pg installation.
There is no user of this return value and furthermore the function can
only ever return true.
These pgindent excludes are no longer relevant.
Since upstream does not ignore it neither should we.
The .gitignore entry was left in the old location when the source for
the executable was moved.
jeltz added 16 commits April 26, 2025 14:58
One of them even always had meson in the name even when make was ran.
Most of the result files and the logs were missing from the artifact
created when running make check-world.
Instead of giving them numbers we call them pg_tde_ddl_start and
pg_tde_ddl_end. Since the triggers are not on the same event the names
do not matter for the order they are executed in.
In tde_keyring.c we do a lot of switching on the keyring type, some of
it which could be done in a slightly nicer way.
The meson and the make scripts had diverged a lot, so this commit fixes
that plus some other inconsistencies.
The TDE_MODE environment variable disables tests we actually want to run
in our Github Actions. This change is also necessary to in a future
commit disable the pg_tde tests in the global TDE mode.
The purpose of the global TDE mode is to run PostgreSQL's normal test
suite but with our extension so running the pg_tde test suite when in
that mode makes no sense.

Meson supports disabling test suites with --no-suite so we only need to
do this for the Makefile.
Now that we no longer run the pg_tde suite in the global TDE mode we can
remove all the code which was there to support it.
This removes some unnecessary queries and formats the queries to be
easier to read.
Also add a couple of tests for the DEFAULT case to avoid regressions.
Improve readability for the provider vs key type logic.
This enum was only used in one place and oscured the two dimensions of
provider types (database vs global) and principal keys (server vs
default vs database).
Make the intent a lot cleaner by doing the check when a global provider
us used instead of deciding to do it or not per user facing function.
While these tests test our changes to pg_waldump they are quite easy to
overlook right now and where exactly should we draw the line? These
tests are not something we ever want to upstream and in the future when
we figure out how we want to make sure pg_waldump works with encrypted
WAL we likely will want to have the tests for that solution in the same
folder as our other tests anyway.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants