pg_query Parser Patches for Postgres 13.8 #4

lfittl · 2022-11-02T22:01:46Z

Do not merge. This PR only exists to track the patches that are applied for pg_query (13-latest branch) on top of Postgres 13.8.

Due to pg_stat_statements using $1, etc for substitution of constants, the parser needs to support additional locations where these values are allowed to be passed in. Examples: CREATE USER test PASSWORD $1; ALTER USER test ENCRYPTED PASSWORD $2; SET SCHEMA $3; SET ROLE $4; SET SESSION AUTHORIZATION $5; SET TIME ZONE $6; SELECT EXTRACT($1 FROM TIMESTAMP $2); SELECT DATE $1; SELECT INTERVAL $1; SELECT INTERVAL $1 YEAR; SELECT INTERVAL (6) $1;

This is for compatibility with Postgres 9.6 and older, which used ? as the replacement character in pg_stat_statements. Note that this intentionally breaks use of ? as an operator in some uncommon cases. This patch will likely be removed with the next major parser release, and should be considered deprecated.

This is helpful for tracking the extent of tokens in the scan output, as this is made available by pg_query for uses such as syntax highlighting.

For syntax highlighting and extracting comments from a query, its very helpful to know the exact locations of a comment in the query string. Previously the lexer discarded all comments as whitespace, making it impossible to determine where they are located in the query string. With this change, the lexer returns them as SQL_COMMENT/C_COMMENT tokens.

This seems like an oversight in the commit that added support for FETCH FIRST... WITH TIES, and causes the parsetree to always have limitOption = LIMIT_OPTION_COUNT, even when no LIMIT/OFFSET is specified.

This frees up the memory allocated to memory contexts that are kept for future allocations. This behaves similar to changing aset.c's MAX_FREE_CONTEXTS to 0, but only does the cleanup when called, and allows the freelist approach to be used during Postgres operations.

This allows other source units to have the accompanying functions for the already exported plpgsql_adddatum.

This is a pg_query-specific patch that ensures we can use the split function on the regression test files. Zero-length delimiters fail at the scanner level in Postgres, and thus need to be removed.

In the latest version of Apple's macOS SDK, <sys/socket.h> fails to compile if "REF" is #define'd as something. Apple may or may not agree that this is a bug, and even if they do accept the bug report I filed, they probably won't fix it very quickly. In the meantime, our back branches will all fail to compile gram.y. v15 and HEAD currently escape the problem thanks to the refactoring done in 98e93a1, but that's purely accidental. Moreover, since that patch removed a widely-visible inclusion of <netdb.h>, back-patching it seems too likely to break third-party code. Instead, change the token's code name to REF_P, following our usual convention for naming parser tokens that are likely to have symbol conflicts. The effects of that should be localized to the grammar and immediately surrounding files, so it seems like a safer answer. Per project policy that we want to keep recently-out-of-support branches buildable on modern systems, back-patch all the way to 9.2. Discussion: https://postgr.es/m/[email protected]

In a similar effort to f736e18 and 110d817, fixup various usages of string functions where a more appropriate function is available and more fit for purpose. These changes include: 1. Use cstring_to_text_with_len() instead of cstring_to_text() when working with a StringInfoData and the length can easily be obtained. 2. Use appendStringInfoString() instead of appendStringInfo() when no formatting is required. 3. Use pstrdup(...) instead of psprintf("%s", ...) 4. Use pstrdup(...) instead of psprintf(...) (with no formatting) 5. Use appendPQExpBufferChar() instead of appendPQExpBufferStr() when the length of the string being appended is 1. 6. appendStringInfoChar() instead of appendStringInfo() when no formatting is required and string is 1 char long. 7. Use appendPQExpBufferStr(b, .) instead of appendPQExpBuffer(b, "%s", .) 8. Don't use pstrdup when it's fine to just point to the string constant. I (David) did find other cases of #8 but opted to use #4 instead as I wasn't certain enough that applying #8 was ok (e.g in hba.c) Author: Ranier Vilela, David Rowley Discussion: https://postgr.es/m/CAApHDvo2j2+RJBGhNtUz6BxabWWh2Jx16wMUMWKUjv70Ver1vg@mail.gmail.com

lfittl and others added 9 commits November 2, 2022 14:58

pg_query: Track yyllocend in lexer

386cf95

This is helpful for tracking the extent of tokens in the scan output, as this is made available by pg_query for uses such as syntax highlighting.

LimitOption: Correctly order LIMIT_OPTION_DEFAULT enum value first

e855d8c

This seems like an oversight in the commit that added support for FETCH FIRST... WITH TIES, and causes the parsetree to always have limitOption = LIMIT_OPTION_COUNT, even when no LIMIT/OFFSET is specified.

PL/pgSQL: Make plpgsql_start_datums and plpgsql_finish_datums extern

4850baf

This allows other source units to have the accompanying functions for the already exported plpgsql_adddatum.

Avoid zero-length delimiter in regression test files

48cc023

This is a pg_query-specific patch that ensures we can use the split function on the regression test files. Zero-length delimiters fail at the scanner level in Postgres, and thus need to be removed.

lfittl force-pushed the lfittl/pg-query-pg13.8 branch from fd92673 to c3506d3 Compare November 2, 2022 22:11

lfittl added the not-for-upstream label Nov 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pg_query Parser Patches for Postgres 13.8 #4

pg_query Parser Patches for Postgres 13.8 #4

Uh oh!

lfittl commented Nov 2, 2022

Uh oh!

Uh oh!

pg_query Parser Patches for Postgres 13.8 #4

Are you sure you want to change the base?

pg_query Parser Patches for Postgres 13.8 #4

Uh oh!

Conversation

lfittl commented Nov 2, 2022

Uh oh!

Uh oh!