Skip to content

CITEXT data type #20028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

CITEXT data type #20028

wants to merge 4 commits into from

Conversation

taroface
Copy link
Contributor

Copy link

netlify bot commented Jul 30, 2025

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit 8095ac4
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-interactivetutorials-docs/deploys/688bd0eac772e0000833be7f

Copy link

netlify bot commented Jul 30, 2025

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit 8095ac4
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-api-docs/deploys/688bd0e94535a600079a6604

Copy link

Copy link

netlify bot commented Jul 30, 2025

Netlify Preview

Name Link
🔨 Latest commit 2690813
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/688a4520afbdc80008a458bd
😎 Deploy Preview https://deploy-preview-20028--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.


The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) stores case-insensitive strings.

All `CITEXT` values are folded to lowercase before comparison. This is handled internally with the [`lower()`]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is handled internally with the [lower()]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function

We actually diverged a bit from how we handle CITEXT internally. Instead of lower(), CRDB handles CITEXT similarly to a collated string with the "und-u-ks-level2" locale.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the goal here is to describe the behavior, not the implementation details like collated strings. Here's an alternative rewording I'll propose:

The CITEXT data type represents a case-insensitive string. The casing of values is preserved through storage and retrieval, just like STRING values. The key difference with STRING values is that comparisons between CITEXT values are case-insensitive.

And add some examples that show what we mean by that, e.g.:

CREATE TABLE t (c0 CITEXT, c1 CITEXT);
-- CREATE TABLE

INSERT INTO t VALUES ('foo', 'FOO');
-- INSERT 0 1

-- Casing is preserved during storage and retrieval.
SELECT * FROM t;
--   c0  | c1
-- ------+------
--   foo | FOO
-- (1 row)

-- Comparisons are insensitive to casing.
SELECT c0 = c1 FROM t;
--   ?column?
-- ------------
--      t
-- (1 row)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgartner There's a fuller example near the end of the doc that conveys the above -- do you think that will suffice?

Copy link

netlify bot commented Jul 30, 2025

Netlify Preview

Name Link
🔨 Latest commit 8095ac4
🔍 Latest deploy log https://app.netlify.com/projects/cockroachdb-docs/deploys/688bd0eac772e0000833be7d
😎 Deploy Preview https://deploy-preview-20028--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.


The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) stores case-insensitive strings.

All `CITEXT` values are folded to lowercase before comparison. This is handled internally with the [`lower()`]({% link {{ page.version.version }}/functions-and-operators.md %}#string-and-byte-functions) function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the goal here is to describe the behavior, not the implementation details like collated strings. Here's an alternative rewording I'll propose:

The CITEXT data type represents a case-insensitive string. The casing of values is preserved through storage and retrieval, just like STRING values. The key difference with STRING values is that comparisons between CITEXT values are case-insensitive.

And add some examples that show what we mean by that, e.g.:

CREATE TABLE t (c0 CITEXT, c1 CITEXT);
-- CREATE TABLE

INSERT INTO t VALUES ('foo', 'FOO');
-- INSERT 0 1

-- Casing is preserved during storage and retrieval.
SELECT * FROM t;
--   c0  | c1
-- ------+------
--   foo | FOO
-- (1 row)

-- Comparisons are insensitive to casing.
SELECT c0 = c1 FROM t;
--   ?column?
-- ------------
--      t
-- (1 row)

@taroface
Copy link
Contributor Author

@paulniziolek @mgartner TFTRs -- I ended up simplifying this doc, in part to address your comments. Please have a look!

Copy link
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I have a couple suggestions to consider which I don't have strong opinions one.

(1 row)
~~~

## Supported casting and conversion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we support more types based on the code - it seems like most types that we can cast from STRING we can also cast from CITEXT. In the current form, this section suggests that only STRING -> CITEXT and CITEXT -> STRING casts are possible.

I wonder whether it'd be better to omit the section about casts altogether? (It doesn't like we're being precise about documenting all casts that we support, across a couple of types I looked at.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing this -- in this case, I'll remove the Casting section from the doc for simplicity, since it's already mentioned in the example and the rest can be gleaned from STRING.

@taroface taroface requested a review from rmloveland July 31, 2025 20:24
Copy link
Contributor

@rmloveland rmloveland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once it's added to the table of contents (unless that was intentional but I'm guessing not)

@@ -0,0 +1,98 @@
---
title: CITEXT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest adding this page to the table of contents (assume not intentional)

docs_area: reference.sql
---

The `CITEXT` [data type]({% link {{ page.version.version }}/data-types.md %}) represents a case-insensitive string. Like `STRING` values, `CITEXT` values preserve their casing when stored and retrieved. Unlike `STRING` values, comparisons between `CITEXT` values are case-insensitive for all Unicode characters that have a defined uppercase/lowercase mapping (e.g., `'É' = 'é'`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Unicode characters" could be a link to e.g. https://en.wikipedia.org/wiki/List_of_Unicode_characters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants