Skip to content

Commit eb5834d

Browse files
committed
Further improvement of make_greater_string.
Make sure that it considers all the possibilities that the old code did, instead of trying only one possibility per character position. To keep the runtime in bounds, instead tweak the character incrementers to not try every possible multibyte character code. Remove unnecessary logic to restore the old character value on failure. Additional comment and formatting cleanup.
1 parent fae54e4 commit eb5834d

File tree

2 files changed

+183
-188
lines changed

2 files changed

+183
-188
lines changed

src/backend/utils/adt/selfuncs.c

+28-9
Original file line numberDiff line numberDiff line change
@@ -5701,13 +5701,23 @@ byte_increment(unsigned char *ptr, int len)
57015701
* and "9" is seen as largest by the collation, and append that to the given
57025702
* prefix before trying to find a string that compares as larger.
57035703
*
5704-
* If we max out the righthand byte, truncate off the last character
5705-
* and start incrementing the next. For example, if "z" were the last
5706-
* character in the sort order, then we could produce "foo" as a
5707-
* string greater than "fonz".
5704+
* To search for a greater string, we repeatedly "increment" the rightmost
5705+
* character, using an encoding-specific character incrementer function.
5706+
* When it's no longer possible to increment the last character, we truncate
5707+
* off that character and start incrementing the next-to-rightmost.
5708+
* For example, if "z" were the last character in the sort order, then we
5709+
* could produce "foo" as a string greater than "fonz".
57085710
*
57095711
* This could be rather slow in the worst case, but in most cases we
57105712
* won't have to try more than one or two strings before succeeding.
5713+
*
5714+
* Note that it's important for the character incrementer not to be too anal
5715+
* about producing every possible character code, since in some cases the only
5716+
* way to get a larger string is to increment a previous character position.
5717+
* So we don't want to spend too much time trying every possible character
5718+
* code at the last position. A good rule of thumb is to be sure that we
5719+
* don't try more than 256*K values for a K-byte character (and definitely
5720+
* not 256^K, which is what an exhaustive search would approach).
57115721
*/
57125722
Const *
57135723
make_greater_string(const Const *str_const, FmgrInfo *ltproc, Oid collation)
@@ -5779,17 +5789,19 @@ make_greater_string(const Const *str_const, FmgrInfo *ltproc, Oid collation)
57795789
}
57805790
}
57815791

5792+
/* Select appropriate character-incrementer function */
57825793
if (datatype == BYTEAOID)
5783-
charinc = &byte_increment;
5794+
charinc = byte_increment;
57845795
else
57855796
charinc = pg_database_encoding_character_incrementer();
57865797

5798+
/* And search ... */
57875799
while (len > 0)
57885800
{
5789-
int charlen;
5801+
int charlen;
57905802
unsigned char *lastchar;
5791-
Const *workstr_const;
57925803

5804+
/* Identify the last character --- for bytea, just the last byte */
57935805
if (datatype == BYTEAOID)
57945806
charlen = 1;
57955807
else
@@ -5799,9 +5811,15 @@ make_greater_string(const Const *str_const, FmgrInfo *ltproc, Oid collation)
57995811
/*
58005812
* Try to generate a larger string by incrementing the last character
58015813
* (for BYTEA, we treat each byte as a character).
5814+
*
5815+
* Note: the incrementer function is expected to return true if it's
5816+
* generated a valid-per-the-encoding new character, otherwise false.
5817+
* The contents of the character on false return are unspecified.
58025818
*/
5803-
if (charinc(lastchar, charlen))
5819+
while (charinc(lastchar, charlen))
58045820
{
5821+
Const *workstr_const;
5822+
58055823
if (datatype == BYTEAOID)
58065824
workstr_const = string_to_bytea_const(workstr, len);
58075825
else
@@ -5825,7 +5843,8 @@ make_greater_string(const Const *str_const, FmgrInfo *ltproc, Oid collation)
58255843
}
58265844

58275845
/*
5828-
* Truncate off the last character or byte.
5846+
* No luck here, so truncate off the last character and try to
5847+
* increment the next one.
58295848
*/
58305849
len -= charlen;
58315850
workstr[len] = '\0';

0 commit comments

Comments
 (0)