-
Notifications
You must be signed in to change notification settings - Fork 472
CITEXT data type #20028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CITEXT data type #20028
Conversation
✅ Deploy Preview for cockroachdb-interactivetutorials-docs canceled.
|
✅ Deploy Preview for cockroachdb-api-docs canceled.
|
2690813
to
1885cd1
Compare
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
src/current/v25.3/citext.md
Outdated
|
||
The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) stores case-insensitive strings. | ||
|
||
All `CITEXT` values are folded to lowercase before comparison. This is handled internally with the [`lower()`]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is handled internally with the [
lower()
]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function
We actually diverged a bit from how we handle CITEXT internally. Instead of lower()
, CRDB handles CITEXT similarly to a collated string with the "und-u-ks-level2"
locale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the goal here is to describe the behavior, not the implementation details like collated strings. Here's an alternative rewording I'll propose:
The
CITEXT
data type represents a case-insensitive string. The casing of values is preserved through storage and retrieval, just likeSTRING
values. The key difference withSTRING
values is that comparisons betweenCITEXT
values are case-insensitive.
And add some examples that show what we mean by that, e.g.:
CREATE TABLE t (c0 CITEXT, c1 CITEXT);
-- CREATE TABLE
INSERT INTO t VALUES ('foo', 'FOO');
-- INSERT 0 1
-- Casing is preserved during storage and retrieval.
SELECT * FROM t;
-- c0 | c1
-- ------+------
-- foo | FOO
-- (1 row)
-- Comparisons are insensitive to casing.
SELECT c0 = c1 FROM t;
-- ?column?
-- ------------
-- t
-- (1 row)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgartner There's a fuller example near the end of the doc that conveys the above -- do you think that will suffice?
✅ Netlify Preview
To edit notification comments on pull requests, go to your Netlify project configuration. |
src/current/v25.3/citext.md
Outdated
|
||
The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) stores case-insensitive strings. | ||
|
||
All `CITEXT` values are folded to lowercase before comparison. This is handled internally with the [`lower()`]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the goal here is to describe the behavior, not the implementation details like collated strings. Here's an alternative rewording I'll propose:
The
CITEXT
data type represents a case-insensitive string. The casing of values is preserved through storage and retrieval, just likeSTRING
values. The key difference withSTRING
values is that comparisons betweenCITEXT
values are case-insensitive.
And add some examples that show what we mean by that, e.g.:
CREATE TABLE t (c0 CITEXT, c1 CITEXT);
-- CREATE TABLE
INSERT INTO t VALUES ('foo', 'FOO');
-- INSERT 0 1
-- Casing is preserved during storage and retrieval.
SELECT * FROM t;
-- c0 | c1
-- ------+------
-- foo | FOO
-- (1 row)
-- Comparisons are insensitive to casing.
SELECT c0 = c1 FROM t;
-- ?column?
-- ------------
-- t
-- (1 row)
@paulniziolek @mgartner TFTRs -- I ended up simplifying this doc, in part to address your comments. Please have a look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I have a couple suggestions to consider which I don't have strong opinions one.
src/current/v25.3/citext.md
Outdated
(1 row) | ||
~~~ | ||
|
||
## Supported casting and conversion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we support more types based on the code - it seems like most types that we can cast from STRING we can also cast from CITEXT. In the current form, this section suggests that only STRING -> CITEXT and CITEXT -> STRING casts are possible.
I wonder whether it'd be better to omit the section about casts altogether? (It doesn't like we're being precise about documenting all casts that we support, across a couple of types I looked at.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing this -- in this case, I'll remove the Casting section from the doc for simplicity, since it's already mentioned in the example and the rest can be gleaned from STRING.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once it's added to the table of contents (unless that was intentional but I'm guessing not)
@@ -0,0 +1,98 @@ | |||
--- | |||
title: CITEXT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest adding this page to the table of contents (assume not intentional)
docs_area: reference.sql | ||
--- | ||
|
||
The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) represents a case-insensitive string. Like `STRING` values, `CITEXT` values preserve their casing when stored and retrieved. Unlike `STRING` values, comparisons between `CITEXT` values are case-insensitive for all Unicode characters that have a defined uppercase/lowercase mapping (e.g., `'É' = 'é'`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Unicode characters" could be a link to e.g. https://en.wikipedia.org/wiki/List_of_Unicode_characters
DOC-14015