Skip to content

pg_query Parser Patches for Postgres 13.2 #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 8 commits into
base: REL_13_STABLE
Choose a base branch
from

Conversation

lfittl
Copy link
Owner

@lfittl lfittl commented Feb 20, 2021

Do not merge. This PR only exists to track the patches that are applied for pg_query (13-latest branch) on top of Postgres 13.2.

Due to pg_stat_statements using $1, etc for substitution of constants, the
parser needs to support additional locations where these values are
allowed to be passed in.

Examples:

CREATE USER test PASSWORD $1;
ALTER USER test ENCRYPTED PASSWORD $2;
SET SCHEMA $3;
SET ROLE $4;
SET SESSION AUTHORIZATION $5;
SET TIME ZONE $6;
SELECT EXTRACT($1 FROM TIMESTAMP $2);
SELECT DATE $1;
SELECT INTERVAL $1;
SELECT INTERVAL $1 YEAR;
SELECT INTERVAL (6) $1;
This is for compatibility with Postgres 9.6 and older, which used ?
as the replacement character in pg_stat_statements.

Note that this intentionally breaks use of ? as an operator in some
uncommon cases.

This patch will likely be removed with the next major parser release, and
should be considered deprecated.
This is helpful for tracking the extent of tokens in the scan output,
as this is made available by pg_query for uses such as syntax highlighting.
For syntax highlighting and extracting comments from a query, its very
helpful to know the exact locations of a comment in the query string.

Previously the lexer discarded all comments as whitespace, making it
impossible to determine where they are located in the query string. With
this change, the lexer returns them as SQL_COMMENT/C_COMMENT tokens.
This seems like an oversight in the commit that added support for
FETCH FIRST... WITH TIES, and causes the parsetree to always have
limitOption = LIMIT_OPTION_COUNT, even when no LIMIT/OFFSET is specified.
This frees up the memory allocated to memory contexts that are kept
for future allocations. This behaves similar to changing aset.c's
MAX_FREE_CONTEXTS to 0, but only does the cleanup when called, and
allows the freelist approach to be used during Postgres operations.
This allows other source units to have the accompanying functions for
the already exported plpgsql_adddatum.
This is a pg_query-specific patch that ensures we can use the split
function on the regression test files. Zero-length delimiters fail
at the scanner level in Postgres, and thus need to be removed.
@lfittl lfittl force-pushed the lfittl/pg-query-pg13.2 branch from da83e90 to 09455f9 Compare March 1, 2021 05:21
lfittl pushed a commit that referenced this pull request Nov 2, 2022
We've heard a couple of reports of people having trouble with
multi-gigabyte-sized query-texts files.  It occurred to me that on
32-bit platforms, there could be an issue with integer overflow
of calculations associated with the total query text size.
Address that with several changes:

1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX.
The hashtable code will bound it to that anyway unless "long"
is 64 bits.  We still need overflow guards on its use, but
this helps.

2. Add a check to prevent extending the query-texts file to
more than MaxAllocHugeSize.  If it got that big, qtext_load_file
would certainly fail, so there's not much point in allowing it.
Without this, we'd need to consider whether extent, query_offset,
and related variables shouldn't be off_t not size_t.

3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit
arithmetic on all platforms.  It appears possible that under duress
those multiplications could overflow 32 bits, yielding a false
conclusion that we need to garbage-collect the texts file, which
could lead to repeatedly garbage-collecting after every hash table
insertion.

Per report from Bruno da Silva.  I'm not convinced that these
issues fully explain his problem; there may be some other bug that's
contributing to the query-texts file becoming so large in the first
place.  But it did get that big, so #2 is a reasonable defense,
and #3 could explain the reported performance difficulties.

(See also commit 8bbe4cb, which addressed some related bugs.
The second Discussion: link is the thread that led up to that.)

This issue is old, and is primarily a problem for old platforms,
so back-patch.

Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com
Discussion: https://postgr.es/m/[email protected]
lfittl pushed a commit that referenced this pull request Nov 21, 2023
checkExtensionMembership() set the DUMP_COMPONENT_SECLABEL and
DUMP_COMPONENT_POLICY flags for extension member objects, even though
we lack any infrastructure for tracking extensions' initial settings
of these properties.  This is not OK.  The result was that a dump
would always include commands to set these properties for extension
objects that have them, with at least three negative consequences:

1. The restoring user might not have privilege to set these properties
on these objects.

2. The properties might be incorrect/irrelevant for the version of the
extension that's installed in the destination database.

3. The dump itself might fail, in the case of RLS properties attached
to extension tables that the dumping user lacks privilege to LOCK.
(That's because we must get at least AccessShareLock to ensure that
we don't fail while trying to decompile the RLS expressions.)

When and if somebody cares to invent initial-state infrastructure for
extensions' RLS policies and security labels, we could think about
finding another way around problem #3.  But in the absence of such
infrastructure, this whole thing is just wrong and we shouldn't do it.

(Note: this applies only to ordinary dumps; binary-upgrade dumps
still dump and restore extension member objects separately, with
all properties.)

Tom Lane and Jacob Champion.  Back-patch to all supported branches.

Discussion: https://postgr.es/m/[email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant