pg_query Parser Patches for Postgres 13.2 #3

lfittl · 2021-02-20T12:39:52Z

Do not merge. This PR only exists to track the patches that are applied for pg_query (13-latest branch) on top of Postgres 13.2.

Due to pg_stat_statements using $1, etc for substitution of constants, the parser needs to support additional locations where these values are allowed to be passed in. Examples: CREATE USER test PASSWORD $1; ALTER USER test ENCRYPTED PASSWORD $2; SET SCHEMA $3; SET ROLE $4; SET SESSION AUTHORIZATION $5; SET TIME ZONE $6; SELECT EXTRACT($1 FROM TIMESTAMP $2); SELECT DATE $1; SELECT INTERVAL $1; SELECT INTERVAL $1 YEAR; SELECT INTERVAL (6) $1;

This is for compatibility with Postgres 9.6 and older, which used ? as the replacement character in pg_stat_statements. Note that this intentionally breaks use of ? as an operator in some uncommon cases. This patch will likely be removed with the next major parser release, and should be considered deprecated.

This is helpful for tracking the extent of tokens in the scan output, as this is made available by pg_query for uses such as syntax highlighting.

For syntax highlighting and extracting comments from a query, its very helpful to know the exact locations of a comment in the query string. Previously the lexer discarded all comments as whitespace, making it impossible to determine where they are located in the query string. With this change, the lexer returns them as SQL_COMMENT/C_COMMENT tokens.

This seems like an oversight in the commit that added support for FETCH FIRST... WITH TIES, and causes the parsetree to always have limitOption = LIMIT_OPTION_COUNT, even when no LIMIT/OFFSET is specified.

This frees up the memory allocated to memory contexts that are kept for future allocations. This behaves similar to changing aset.c's MAX_FREE_CONTEXTS to 0, but only does the cleanup when called, and allows the freelist approach to be used during Postgres operations.

This allows other source units to have the accompanying functions for the already exported plpgsql_adddatum.

This is a pg_query-specific patch that ensures we can use the split function on the regression test files. Zero-length delimiters fail at the scanner level in Postgres, and thus need to be removed.

We've heard a couple of reports of people having trouble with multi-gigabyte-sized query-texts files. It occurred to me that on 32-bit platforms, there could be an issue with integer overflow of calculations associated with the total query text size. Address that with several changes: 1. Limit pg_stat_statements.max to INT_MAX / 2 not INT_MAX. The hashtable code will bound it to that anyway unless "long" is 64 bits. We still need overflow guards on its use, but this helps. 2. Add a check to prevent extending the query-texts file to more than MaxAllocHugeSize. If it got that big, qtext_load_file would certainly fail, so there's not much point in allowing it. Without this, we'd need to consider whether extent, query_offset, and related variables shouldn't be off_t not size_t. 3. Adjust the comparisons in need_gc_qtexts() to be done in 64-bit arithmetic on all platforms. It appears possible that under duress those multiplications could overflow 32 bits, yielding a false conclusion that we need to garbage-collect the texts file, which could lead to repeatedly garbage-collecting after every hash table insertion. Per report from Bruno da Silva. I'm not convinced that these issues fully explain his problem; there may be some other bug that's contributing to the query-texts file becoming so large in the first place. But it did get that big, so #2 is a reasonable defense, and #3 could explain the reported performance difficulties. (See also commit 8bbe4cb, which addressed some related bugs. The second Discussion: link is the thread that led up to that.) This issue is old, and is primarily a problem for old platforms, so back-patch. Discussion: https://postgr.es/m/CAB+Nuk93fL1Q9eLOCotvLP07g7RAv4vbdrkm0cVQohDVMpAb9A@mail.gmail.com Discussion: https://postgr.es/m/[email protected]

checkExtensionMembership() set the DUMP_COMPONENT_SECLABEL and DUMP_COMPONENT_POLICY flags for extension member objects, even though we lack any infrastructure for tracking extensions' initial settings of these properties. This is not OK. The result was that a dump would always include commands to set these properties for extension objects that have them, with at least three negative consequences: 1. The restoring user might not have privilege to set these properties on these objects. 2. The properties might be incorrect/irrelevant for the version of the extension that's installed in the destination database. 3. The dump itself might fail, in the case of RLS properties attached to extension tables that the dumping user lacks privilege to LOCK. (That's because we must get at least AccessShareLock to ensure that we don't fail while trying to decompile the RLS expressions.) When and if somebody cares to invent initial-state infrastructure for extensions' RLS policies and security labels, we could think about finding another way around problem #3. But in the absence of such infrastructure, this whole thing is just wrong and we shouldn't do it. (Note: this applies only to ordinary dumps; binary-upgrade dumps still dump and restore extension member objects separately, with all properties.) Tom Lane and Jacob Champion. Back-patch to all supported branches. Discussion: https://postgr.es/m/[email protected]

lfittl mentioned this pull request Feb 20, 2021

pg_query Parser Patches for Postgres 13.1 #1

Closed

lfittl added 7 commits February 28, 2021 21:11

pg_query: Track yyllocend in lexer

d994cb6

This is helpful for tracking the extent of tokens in the scan output, as this is made available by pg_query for uses such as syntax highlighting.

LimitOption: Correctly order LIMIT_OPTION_DEFAULT enum value first

6ca5d16

This seems like an oversight in the commit that added support for FETCH FIRST... WITH TIES, and causes the parsetree to always have limitOption = LIMIT_OPTION_COUNT, even when no LIMIT/OFFSET is specified.

PL/pgSQL: Make plpgsql_start_datums and plpgsql_finish_datums extern

ddfed23

This allows other source units to have the accompanying functions for the already exported plpgsql_adddatum.

Avoid zero-length delimiter in regression test files

09455f9

This is a pg_query-specific patch that ensures we can use the split function on the regression test files. Zero-length delimiters fail at the scanner level in Postgres, and thus need to be removed.

lfittl force-pushed the lfittl/pg-query-pg13.2 branch from da83e90 to 09455f9 Compare March 1, 2021 05:21

lfittl added the not-for-upstream label Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pg_query Parser Patches for Postgres 13.2 #3

pg_query Parser Patches for Postgres 13.2 #3

Uh oh!

lfittl commented Feb 20, 2021

Uh oh!

Uh oh!

pg_query Parser Patches for Postgres 13.2 #3

Are you sure you want to change the base?

pg_query Parser Patches for Postgres 13.2 #3

Uh oh!

Conversation

lfittl commented Feb 20, 2021

Uh oh!

Uh oh!