From 293a672327535dd633bda829340d12ea411845ac Mon Sep 17 00:00:00 2001 From: javier Date: Fri, 28 Feb 2025 17:19:45 +0100 Subject: [PATCH 01/13] first version of schema design essentials --- .../guides/schema-design-essentials.md | 240 ++++++++++++++++++ documentation/sidebars.js | 5 + 2 files changed, 245 insertions(+) create mode 100644 documentation/guides/schema-design-essentials.md diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md new file mode 100644 index 000000000..056ca831a --- /dev/null +++ b/documentation/guides/schema-design-essentials.md @@ -0,0 +1,240 @@ +--- +title: Schema Design Essentials +slug: /schema-design-essentials +description: + Learn how to design efficient schemas in QuestDB. This guide covers best practices for partitioning, indexing, symbols, timestamps, deduplication, retention strategies, and schema modifications to optimize performance in time-series workloads. +--- + +# Schema Design in QuestDB + +This guide covers key concepts and best practices to take full advantage of QuestDB’s performance-oriented architecture, highlighting some important differences with most databases. + +## QuestDB’s Single Database Model + +QuestDB has a **single database per instance**. Unlike PostgreSQL and other database engines, where you may have multiple databases or multiple schemas within an instance, in QuestDB, you operate within a single namespace. The default database is named `qdb`, and this can be changed via configuration. However, there is no need to issue `USE DATABASE` commands—once connected, you can immediately start querying and inserting data. + +### Multi-Tenancy Considerations + +If you need **multi-tenancy**, you will need to manage table names manually, often by using **prefixes** for different datasets. Since QuestDB does not support multiple schemas, this is the primary way to segment data. In QuestDB Enterprise, you can enforce permissions per table to restrict access, allowing finer control over multi-tenant environments. + +## PostgreSQL Protocol Compatibility + +QuestDB is **not** a PostgreSQL database but is **compatible with the PostgreSQL wire protocol**. This means you can connect using PostgreSQL-compatible libraries and clients and execute SQL commands. However, compatibility with PostgreSQL system catalogs, metadata queries, data types, +and functions is limited. + +While most PostgreSQL compatible low-level libraries will work with +QuestDB, some higher level components depending heavily on PostgreSQL metadata might fail. If you find one of such use cases, please do report it as an [issue on GitHub](https://github.com/questdb/questdb/issues) so we can track it. + +## Creating a Schema in QuestDB + +### Recommended Approach + +The easiest way to create a schema is through the **web interface** or by sending SQL commands using: + +- The **REST API** (`CREATE TABLE` statements) +- The **PostgreSQL wire protocol clients** + +### Schema Auto-Creation with ILP Protocol + +When using the **Influx Line Protocol (ILP)**, QuestDB can automatically create tables and columns based on incoming data. This is useful for users migrating from InfluxDB or using tools like **InfluxDB client libraries or Telegraf**, as they can send data directly to QuestDB without pre-defining schemas. However, this comes with limitations: + +- Auto-created tables and columns **use default settings** (e.g., default partitioning, symbol capacity, and data types). +- **You cannot easily modify partitioning or symbol capacity later**, so it is recommended to explicitly create tables beforehand. +- Auto-creation can be disabled via configuration. + +## The Designated Timestamp and Partitioning Strategy + +QuestDB is designed for the use case of time-series. Everything on the database engine is optimized to perform exceptionally well for time-series queries. One of the most important optimizations in QuestDB is that data is physically stored ordered by incremental timestamp. It is the responsibility of the user, at table creation time, to decide which timestamp column will be the **designatd timestamp**. + +The **designated timestamp** is one of the **most important decision** when creating a table in QuestDB. It determines: + +- **Partitioning strategy** (per hour, per day, per week, per month, or per year). +- **Physical data storage order**, as data is always stored **sorted by the designated timestamp**. +- **Query efficiency**, as QuestDB **prunes partitions** based on the timestamp range in your query, reducing disk I/O. +- **Insertion performance**, since data that arrives **out of order** may require rewriting partitions, slowing down ingestion. + +### Partitioning Guidelines + +When choosing the partition resolution for your tables, keep in mind which is the typical time-ranges that you will be querying most frequently, and consider the following: + +- **Avoid very large partitions**: A partition should be at most **a few gigabytes**. +- **Avoid too many small partitions**: Querying more partitions means opening more files. +- **Query efficiency**: When filtering data, QuestDB prunes partitions, but querying many partitions results in more disk operations. If most of your queries span a monthly range, weekly or daily partitioning sounds sensible, but hourly partitioning might slow down your queries. +- **Data ingestion performance**: If data arrives out of order, QuestDB rewrites the active partition, impacting performance. + +## Columnar Storage Model and Table Density + +QuestDB is **columnar**, meaning: + +- **Columns are stored separately**, allowing fast queries on specific columns without loading unnecessary data. +- **Each column is stored in one or two files per partition**: The more columns you include at any part of a `SELECT` and the most partitions the query spans, the more files will need to be open and cached into the working memory. + +### Sparse vs. Dense Tables + +- **QuestDB handles wide tables efficiently** due to its columnar architecture, as it will open only the column files referenced at each query. +- **Null values take storage space**, so it is recommended to avoid sparse tables where possible. +- **Dense tables** (where most columns have values) are more efficient in terms of storage and query performance. If you cannot design a dense table, you might want to create different tables for each different record structure. + +## Data Types and Best Practices + +### Symbols (Recommended for Categorical Data) + +QuestDB introduces a specialized `Symbol` data type. Symbols are **dictionary-encoded** and optimized for filtering and grouping: + +- Use symbols for **categorical data** with a limited number of unique values (e.g., country codes, stock tickers, factory floor IDs). +- Symbols are fine for **storing up to a few million distinct values** but should be avoided beyond that. +- Avoid using a **`SYMBOL`** for columns that would be considered a `PRIMARY KEY` in other databases. +- **If very high cardinality is expected**, use **`VARCHAR`** instead of **`SYMBOL`**. +- **Symbols are compact on disk**, reducing storage overhead. +- **Symbol capacity defaults to 256**, but it will dynamically expand as needed, causing temporary slowdowns. +- **If you expect high cardinality, define the symbol capacity at table creation time** to avoid performance issues. + +### Timestamps + +- **All timestamps in QuestDB are stored in UTC** at **Microsecond resolution**: Even if you can ingest data sending timestamps in nanoseconds, nanosecond precision is not retained. +- The **`TIMESTAMP`** type is recommended over **`DATETIME`**, unless you have checked the data types reference and you know what you are doing. +- **At query time, you can apply a time zone conversion for display purposes**. + +### Strings vs. VARCHAR + +- Avoid **`STRING`**: It is a legacy data type. +- Use **`VARCHAR`** instead for general string storage. + +### UUIDs + +- QuestDB has a dedicated **`UUID`** type, which is more efficient than storing UUIDs as `VARCHAR`. + +### Other Data Types + +- **Booleans**: `true`/`false` values are supported). +- **Bytes**: `BYTES` type allows storing raw binary data. +- **IPv4**" QuestDB has a dedicated `IPv4` type for optimized IP storage and filtering. +- **Several numeric datatypes** are supported. +- **Geo**: QuestDB provides spatial support via geohashes. + +## Referential Integrity, Constraints, and Deduplication + +- QuestDB **does not enforce** `primary keys`, `foreign keys`, or **`NOT NULL`** constraints. +- **Joins between tables work even without referential integrity**, as long as the data types on the join condition are compatible. +- **Duplicate data is allowed by default**, but `UPSERT` keys can be defined to **ensure uniqueness**. +- **Deduplication in QuestDB happens on an exact timestamp and optionally a set of other columns (`UPSERT KEYS`)**. +- **Deduplication has no noticeable performance penalty**. + + +## Retention Strategies with TTL and Materialized Views + +Since **individual row deletions are not supported**, data retention is managed via: + +- **Partition expiration**: Define a **TTL (Time-To-Live)** retention period per table. +- **Materialized Views**: QuestDB allows creating **auto-refreshing materialized views** to store aggregated data at lower granularity while applying optional expiration via TTL on the base table. Materialized Views can also define a TTL for their data. + +## Schema Decisions That Cannot Be Easily Changed + +Some table properties **cannot be modified after creation**, including: + +- **The designated timestamp** (cannot be altered once set). +- **Partitioning strategy** (cannot be changed later). +- **Symbol capacity** (must be defined upfront, otherwise defaults apply). + +For changes, the typical workaround is: + +1. Create a **new column** with the updated configuration. +2. Copy data from the old column into the new one. +3. Drop the old column and rename the new one. +4. **If changes affect table-wide properties** (e.g., partitioning, timestamp column, or WAL settings), create a new table with the required properties, insert data from the old table, drop the old table, and rename the new table. + +## Examples of Schema Translations from Other Databases + +```sql +-- PostgreSQL +CREATE TABLE metrics ( + timestamp TIMESTAMP PRIMARY KEY, + name VARCHAR(255) NOT NULL, + description VARCHAR(500), + unit VARCHAR(50), + id UUID PRIMARY KEY, + value DOUBLE PRECISION NOT NULL +); +CREATE INDEX ON metrics (name, timestamp); + +-- UPSERT behavior in PostgreSQL +INSERT INTO metrics (timestamp, name, description, unit, id, value) +VALUES (...) +ON CONFLICT (timestamp, name) DO UPDATE +SET description = EXCLUDED.description, + unit = EXCLUDED.unit, + value = EXCLUDED.value; +``` + +```sql +-- TimescaleDB +CREATE TABLE metrics ( + timestamp TIMESTAMPTZ NOT NULL, + name VARCHAR(255) NOT NULL, + description VARCHAR(500), + unit VARCHAR(50), + id UUID PRIMARY KEY, + value DOUBLE PRECISION NOT NULL +); +SELECT create_hypertable('metrics', 'timestamp'); +CREATE INDEX ON metrics (name, timestamp); + +-- UPSERT behavior in TimescaleDB +INSERT INTO metrics (timestamp, name, description, unit, id, value) +VALUES (...) +ON CONFLICT (timestamp, name) DO UPDATE +SET description = EXCLUDED.description, + unit = EXCLUDED.unit, + value = EXCLUDED.value; +``` + +```sql +-- DuckDB +CREATE TABLE metrics ( + timestamp TIMESTAMP NOT NULL, + name VARCHAR(255) NOT NULL, + description VARCHAR(500), + unit VARCHAR(50), + id UUID PRIMARY KEY, + value DOUBLE NOT NULL +); +CREATE INDEX ON metrics (name, timestamp); +``` + +```sql +-- ClickHouse +CREATE TABLE metrics ( + timestamp DateTime, + name String, + description String, + unit String, + id UUID, + value Float64 +) ENGINE = MergeTree() +ORDER BY (name, timestamp); + +-- InfluxDB Measurement +measurement: metrics +name (tag) +description (tag) +unit (tag) +id (tag) +value (field) +``` + +```sql +-- QuestDB Equivalent +CREATE TABLE metrics ( + timestamp TIMESTAMP, -- Explicit timestamp for time-series queries + name SYMBOL CAPACITY 50000, -- Optimized for high-cardinality categorical values + description VARCHAR, -- Free-text description, not ideal for SYMBOL indexing + unit SYMBOL CAPACITY 256, -- Limited set of unit types, efficient as SYMBOL + id UUID, -- UUID optimized for unique identifiers + value DOUBLE -- Numeric measurement field +) TIMESTAMP(timestamp) +PARTITION BY DAY WAL +DEDUP UPSERT KEYS(timestamp, name); +``` + + + diff --git a/documentation/sidebars.js b/documentation/sidebars.js index 9f7518df7..5219fab89 100644 --- a/documentation/sidebars.js +++ b/documentation/sidebars.js @@ -14,6 +14,11 @@ module.exports = { type: "doc", customProps: { tag: "Popular" }, }, + { + id: "guides/schema-design-essentials", + type: "doc", + customProps: { tag: "Popular" }, + }, { id: "guides/influxdb-migration", type: "doc", From ee1634c8474441fc680b50c1616dd515855703dc Mon Sep 17 00:00:00 2001 From: javier Date: Fri, 28 Feb 2025 17:29:39 +0100 Subject: [PATCH 02/13] questdb sql highlighting --- documentation/guides/schema-design-essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md index 056ca831a..e6b48bdbd 100644 --- a/documentation/guides/schema-design-essentials.md +++ b/documentation/guides/schema-design-essentials.md @@ -222,7 +222,7 @@ id (tag) value (field) ``` -```sql +```questdb-sql -- QuestDB Equivalent CREATE TABLE metrics ( timestamp TIMESTAMP, -- Explicit timestamp for time-series queries From d39cf0c0c0bb578778ce3242cbb9905a770ec016 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?javier=20ram=C3=ADrez?= Date: Mon, 3 Mar 2025 11:43:35 +0100 Subject: [PATCH 03/13] Update documentation/guides/schema-design-essentials.md Co-authored-by: goodroot <9484709+goodroot@users.noreply.github.com> --- documentation/guides/schema-design-essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md index e6b48bdbd..2b5bf72cd 100644 --- a/documentation/guides/schema-design-essentials.md +++ b/documentation/guides/schema-design-essentials.md @@ -1,6 +1,6 @@ --- title: Schema Design Essentials -slug: /schema-design-essentials +slug: schema-design-essentials description: Learn how to design efficient schemas in QuestDB. This guide covers best practices for partitioning, indexing, symbols, timestamps, deduplication, retention strategies, and schema modifications to optimize performance in time-series workloads. --- From 527be4cc117017ae1016afcb216a4fa440ff29ba Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?javier=20ram=C3=ADrez?= Date: Mon, 3 Mar 2025 11:43:52 +0100 Subject: [PATCH 04/13] Update documentation/guides/schema-design-essentials.md Co-authored-by: goodroot <9484709+goodroot@users.noreply.github.com> --- documentation/guides/schema-design-essentials.md | 1 - 1 file changed, 1 deletion(-) diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md index 2b5bf72cd..0ead33d61 100644 --- a/documentation/guides/schema-design-essentials.md +++ b/documentation/guides/schema-design-essentials.md @@ -5,7 +5,6 @@ description: Learn how to design efficient schemas in QuestDB. This guide covers best practices for partitioning, indexing, symbols, timestamps, deduplication, retention strategies, and schema modifications to optimize performance in time-series workloads. --- -# Schema Design in QuestDB This guide covers key concepts and best practices to take full advantage of QuestDB’s performance-oriented architecture, highlighting some important differences with most databases. From c32caad32fc45cc0ecd697c9d1fc4e80558803c8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?javier=20ram=C3=ADrez?= Date: Mon, 3 Mar 2025 11:44:12 +0100 Subject: [PATCH 05/13] Update documentation/guides/schema-design-essentials.md breaking into paragraphs Co-authored-by: goodroot <9484709+goodroot@users.noreply.github.com> --- documentation/guides/schema-design-essentials.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md index 0ead33d61..9b7834f03 100644 --- a/documentation/guides/schema-design-essentials.md +++ b/documentation/guides/schema-design-essentials.md @@ -10,7 +10,9 @@ This guide covers key concepts and best practices to take full advantage of Ques ## QuestDB’s Single Database Model -QuestDB has a **single database per instance**. Unlike PostgreSQL and other database engines, where you may have multiple databases or multiple schemas within an instance, in QuestDB, you operate within a single namespace. The default database is named `qdb`, and this can be changed via configuration. However, there is no need to issue `USE DATABASE` commands—once connected, you can immediately start querying and inserting data. +QuestDB has a **single database per instance**. Unlike PostgreSQL and other database engines, where you may have multiple databases or multiple schemas within an instance, in QuestDB, you operate within a single namespace. + +The default database is named `qdb`, and this can be changed via configuration. However, unlike a standard SQL database, there is no need to issue `USE DATABASE` commands—once connected. You can immediately start querying and inserting data. ### Multi-Tenancy Considerations From 409aa737acd5d45f4b5c8314b7ca26f0876691ac Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?javier=20ram=C3=ADrez?= Date: Mon, 3 Mar 2025 11:44:34 +0100 Subject: [PATCH 06/13] Update documentation/guides/schema-design-essentials.md link Co-authored-by: goodroot <9484709+goodroot@users.noreply.github.com> --- documentation/guides/schema-design-essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md index 9b7834f03..cf5f61396 100644 --- a/documentation/guides/schema-design-essentials.md +++ b/documentation/guides/schema-design-essentials.md @@ -16,7 +16,7 @@ The default database is named `qdb`, and this can be changed via configuration. ### Multi-Tenancy Considerations -If you need **multi-tenancy**, you will need to manage table names manually, often by using **prefixes** for different datasets. Since QuestDB does not support multiple schemas, this is the primary way to segment data. In QuestDB Enterprise, you can enforce permissions per table to restrict access, allowing finer control over multi-tenant environments. +If you need **multi-tenancy**, you will need to manage table names manually, often by using **prefixes** for different datasets. Since QuestDB does not support multiple schemas, this is the primary way to segment data. In QuestDB Enterprise, you can [enforce permissions per table to restrict access](/docs/operations/rbac/), allowing finer control over multi-tenant environments. ## PostgreSQL Protocol Compatibility From 58fec7e83b682b631b37f64b48f6e64f4faaf92d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?javier=20ram=C3=ADrez?= Date: Mon, 3 Mar 2025 11:44:46 +0100 Subject: [PATCH 07/13] Update documentation/guides/schema-design-essentials.md link Co-authored-by: goodroot <9484709+goodroot@users.noreply.github.com> --- documentation/guides/schema-design-essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md index cf5f61396..e95814e57 100644 --- a/documentation/guides/schema-design-essentials.md +++ b/documentation/guides/schema-design-essentials.md @@ -30,7 +30,7 @@ QuestDB, some higher level components depending heavily on PostgreSQL metadata m ### Recommended Approach -The easiest way to create a schema is through the **web interface** or by sending SQL commands using: +The easiest way to create a schema is through the **[Web Console](/docs/web-console/)** or by sending SQL commands using: - The **REST API** (`CREATE TABLE` statements) - The **PostgreSQL wire protocol clients** From 0a1764034227ab3a40ea8f91e13a8ff7ed9682cb Mon Sep 17 00:00:00 2001 From: javier Date: Mon, 3 Mar 2025 13:29:56 +0100 Subject: [PATCH 08/13] active voice, section titles, improved examples --- .../guides/schema-design-essentials.md | 128 ++++++++++-------- 1 file changed, 68 insertions(+), 60 deletions(-) diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md index e95814e57..5c4aa924b 100644 --- a/documentation/guides/schema-design-essentials.md +++ b/documentation/guides/schema-design-essentials.md @@ -1,86 +1,85 @@ --- -title: Schema Design Essentials + +title: Schema design essentials slug: schema-design-essentials description: - Learn how to design efficient schemas in QuestDB. This guide covers best practices for partitioning, indexing, symbols, timestamps, deduplication, retention strategies, and schema modifications to optimize performance in time-series workloads. +Learn how to design efficient schemas in QuestDB. This guide covers best practices for partitioning, indexing, symbols, timestamps, deduplication, retention strategies, and schema modifications to optimize performance in time-series workloads. --- - This guide covers key concepts and best practices to take full advantage of QuestDB’s performance-oriented architecture, highlighting some important differences with most databases. -## QuestDB’s Single Database Model +## QuestDB’s single database model -QuestDB has a **single database per instance**. Unlike PostgreSQL and other database engines, where you may have multiple databases or multiple schemas within an instance, in QuestDB, you operate within a single namespace. +QuestDB has a **single database per instance**. Unlike PostgreSQL and other database engines, where you may have multiple databases or multiple schemas within an instance, in QuestDB, you operate within a single namespace. -The default database is named `qdb`, and this can be changed via configuration. However, unlike a standard SQL database, there is no need to issue `USE DATABASE` commands—once connected. You can immediately start querying and inserting data. +The default database is named `qdb`, and this can be changed via configuration. However, unlike a standard SQL database, there is no need to issue `USE DATABASE` commands—once connected, you can immediately start querying and inserting data. -### Multi-Tenancy Considerations +### Multi-tenancy considerations -If you need **multi-tenancy**, you will need to manage table names manually, often by using **prefixes** for different datasets. Since QuestDB does not support multiple schemas, this is the primary way to segment data. In QuestDB Enterprise, you can [enforce permissions per table to restrict access](/docs/operations/rbac/), allowing finer control over multi-tenant environments. +If you need **multi-tenancy**, you must manage table names manually, often by using **prefixes** for different datasets. Since QuestDB does not support multiple schemas, this is the primary way to segment data. In QuestDB Enterprise, you can [enforce permissions per table to restrict access](/docs/operations/rbac/), allowing finer control over multi-tenant environments. -## PostgreSQL Protocol Compatibility +## PostgreSQL protocol compatibility -QuestDB is **not** a PostgreSQL database but is **compatible with the PostgreSQL wire protocol**. This means you can connect using PostgreSQL-compatible libraries and clients and execute SQL commands. However, compatibility with PostgreSQL system catalogs, metadata queries, data types, -and functions is limited. +QuestDB is **not** a PostgreSQL database but is **compatible with the PostgreSQL wire protocol**. This means you can connect using PostgreSQL-compatible libraries and clients and execute SQL commands. However, compatibility with PostgreSQL system catalogs, metadata queries, data types, and functions is limited. -While most PostgreSQL compatible low-level libraries will work with -QuestDB, some higher level components depending heavily on PostgreSQL metadata might fail. If you find one of such use cases, please do report it as an [issue on GitHub](https://github.com/questdb/questdb/issues) so we can track it. +While most PostgreSQL-compatible low-level libraries work with QuestDB, some higher-level components that depend heavily on PostgreSQL metadata might fail. If you encounter such a case, please report it as an [issue on GitHub](https://github.com/questdb/questdb/issues) so we can track it. -## Creating a Schema in QuestDB +## Creating a schema in QuestDB -### Recommended Approach +### Recommended approach The easiest way to create a schema is through the **[Web Console](/docs/web-console/)** or by sending SQL commands using: - The **REST API** (`CREATE TABLE` statements) - The **PostgreSQL wire protocol clients** -### Schema Auto-Creation with ILP Protocol +### Schema auto-creation with ILP protocol -When using the **Influx Line Protocol (ILP)**, QuestDB can automatically create tables and columns based on incoming data. This is useful for users migrating from InfluxDB or using tools like **InfluxDB client libraries or Telegraf**, as they can send data directly to QuestDB without pre-defining schemas. However, this comes with limitations: +When using the **Influx Line Protocol (ILP)**, QuestDB automatically creates tables and columns based on incoming data. This is useful for users migrating from InfluxDB or using tools like **InfluxDB client libraries or Telegraf**, as they can send data directly to QuestDB without pre-defining schemas. However, this comes with limitations: -- Auto-created tables and columns **use default settings** (e.g., default partitioning, symbol capacity, and data types). -- **You cannot easily modify partitioning or symbol capacity later**, so it is recommended to explicitly create tables beforehand. +- QuestDB applies **default settings** to auto-created tables and columns (e.g., partitioning, symbol capacity, and data types). +- Users **cannot modify partitioning or symbol capacity later**, so they should create tables explicitly beforehand. - Auto-creation can be disabled via configuration. -## The Designated Timestamp and Partitioning Strategy +## The designated timestamp and partitioning strategy -QuestDB is designed for the use case of time-series. Everything on the database engine is optimized to perform exceptionally well for time-series queries. One of the most important optimizations in QuestDB is that data is physically stored ordered by incremental timestamp. It is the responsibility of the user, at table creation time, to decide which timestamp column will be the **designatd timestamp**. +QuestDB is designed for time-series workloads. The database engine is optimized to perform exceptionally well for time-series queries. One of the most important optimizations in QuestDB is that data is physically stored ordered by incremental timestamp. The user must choose the **designated timestamp** when creating a table. -The **designated timestamp** is one of the **most important decision** when creating a table in QuestDB. It determines: +The **designated timestamp** is crucial in QuestDB. It directly affects: -- **Partitioning strategy** (per hour, per day, per week, per month, or per year). +- **How QuestDB partitions data** (by hour, day, week, month, or year). - **Physical data storage order**, as data is always stored **sorted by the designated timestamp**. -- **Query efficiency**, as QuestDB **prunes partitions** based on the timestamp range in your query, reducing disk I/O. -- **Insertion performance**, since data that arrives **out of order** may require rewriting partitions, slowing down ingestion. +- **Query efficiency**, since QuestDB **prunes partitions** based on the timestamp range in your query, reducing disk I/O. +- **Insertion performance**, because **out-of-order data forces QuestDB to rewrite partitions**, slowing down ingestion. -### Partitioning Guidelines +### Partitioning guidelines -When choosing the partition resolution for your tables, keep in mind which is the typical time-ranges that you will be querying most frequently, and consider the following: +When choosing the partition resolution for your tables, consider the time ranges you will query most frequently and keep in mind the following: - **Avoid very large partitions**: A partition should be at most **a few gigabytes**. - **Avoid too many small partitions**: Querying more partitions means opening more files. - **Query efficiency**: When filtering data, QuestDB prunes partitions, but querying many partitions results in more disk operations. If most of your queries span a monthly range, weekly or daily partitioning sounds sensible, but hourly partitioning might slow down your queries. - **Data ingestion performance**: If data arrives out of order, QuestDB rewrites the active partition, impacting performance. -## Columnar Storage Model and Table Density +## Columnar storage model and table density QuestDB is **columnar**, meaning: - **Columns are stored separately**, allowing fast queries on specific columns without loading unnecessary data. -- **Each column is stored in one or two files per partition**: The more columns you include at any part of a `SELECT` and the most partitions the query spans, the more files will need to be open and cached into the working memory. +- **Each column is stored in one or two files per partition**: The more columns you include in a `SELECT` and the more partitions the query spans, the more files will need to be opened and cached into working memory. -### Sparse vs. Dense Tables +### Sparse vs. dense tables -- **QuestDB handles wide tables efficiently** due to its columnar architecture, as it will open only the column files referenced at each query. +- **QuestDB handles wide tables efficiently** due to its columnar architecture, as it will open only the column files referenced in each query. - **Null values take storage space**, so it is recommended to avoid sparse tables where possible. -- **Dense tables** (where most columns have values) are more efficient in terms of storage and query performance. If you cannot design a dense table, you might want to create different tables for each different record structure. +- **Dense tables** (where most columns have values) are more efficient in terms of storage and query performance. If you cannot design a dense table, consider creating different tables for distinct record structures. -## Data Types and Best Practices -### Symbols (Recommended for Categorical Data) +## Data types and best practices -QuestDB introduces a specialized `Symbol` data type. Symbols are **dictionary-encoded** and optimized for filtering and grouping: +### Symbols (recommended for categorical data) + +QuestDB introduces a specialized `SYMBOL` data type. Symbols are **dictionary-encoded** and optimized for filtering and grouping: - Use symbols for **categorical data** with a limited number of unique values (e.g., country codes, stock tickers, factory floor IDs). - Symbols are fine for **storing up to a few million distinct values** but should be avoided beyond that. @@ -96,7 +95,7 @@ QuestDB introduces a specialized `Symbol` data type. Symbols are **dictionary-en - The **`TIMESTAMP`** type is recommended over **`DATETIME`**, unless you have checked the data types reference and you know what you are doing. - **At query time, you can apply a time zone conversion for display purposes**. -### Strings vs. VARCHAR +### Strings vs. varchar - Avoid **`STRING`**: It is a legacy data type. - Use **`VARCHAR`** instead for general string storage. @@ -105,31 +104,30 @@ QuestDB introduces a specialized `Symbol` data type. Symbols are **dictionary-en - QuestDB has a dedicated **`UUID`** type, which is more efficient than storing UUIDs as `VARCHAR`. -### Other Data Types +### Other data types -- **Booleans**: `true`/`false` values are supported). +- **Booleans**: `true`/`false` values are supported. - **Bytes**: `BYTES` type allows storing raw binary data. -- **IPv4**" QuestDB has a dedicated `IPv4` type for optimized IP storage and filtering. +- **IPv4**: QuestDB has a dedicated `IPv4` type for optimized IP storage and filtering. - **Several numeric datatypes** are supported. - **Geo**: QuestDB provides spatial support via geohashes. -## Referential Integrity, Constraints, and Deduplication +## Referential integrity, constraints, and deduplication -- QuestDB **does not enforce** `primary keys`, `foreign keys`, or **`NOT NULL`** constraints. +- QuestDB **does not enforce** `PRIMARY KEYS`, `FOREIGN KEYS`, or **`NOT NULL`** constraints. - **Joins between tables work even without referential integrity**, as long as the data types on the join condition are compatible. -- **Duplicate data is allowed by default**, but `UPSERT` keys can be defined to **ensure uniqueness**. +- **Duplicate data is allowed by default**, but `UPSERT KEYS` can be defined to **ensure uniqueness**. - **Deduplication in QuestDB happens on an exact timestamp and optionally a set of other columns (`UPSERT KEYS`)**. - **Deduplication has no noticeable performance penalty**. - -## Retention Strategies with TTL and Materialized Views +## Retention strategies with TTL and materialized views Since **individual row deletions are not supported**, data retention is managed via: -- **Partition expiration**: Define a **TTL (Time-To-Live)** retention period per table. -- **Materialized Views**: QuestDB allows creating **auto-refreshing materialized views** to store aggregated data at lower granularity while applying optional expiration via TTL on the base table. Materialized Views can also define a TTL for their data. +- **Setting a TTL retention period** per table to control partition expiration. +- **Materialized views**: QuestDB **automatically refreshes** materialized views, storing aggregated data at lower granularity. You can also apply TTL expiration on the base table. -## Schema Decisions That Cannot Be Easily Changed +## Schema decisions that cannot be easily changed Some table properties **cannot be modified after creation**, including: @@ -144,9 +142,10 @@ For changes, the typical workaround is: 3. Drop the old column and rename the new one. 4. **If changes affect table-wide properties** (e.g., partitioning, timestamp column, or WAL settings), create a new table with the required properties, insert data from the old table, drop the old table, and rename the new table. -## Examples of Schema Translations from Other Databases -```sql +## Examples of schema translations from other databases + +```questdb-sql title="Create sample table with deduplication/upsert for PostgreSQL -- PostgreSQL CREATE TABLE metrics ( timestamp TIMESTAMP PRIMARY KEY, @@ -167,8 +166,8 @@ SET description = EXCLUDED.description, value = EXCLUDED.value; ``` -```sql --- TimescaleDB +```questdb-sql title="Create sample table with deduplication/upsert for Timescale +-- Timescale CREATE TABLE metrics ( timestamp TIMESTAMPTZ NOT NULL, name VARCHAR(255) NOT NULL, @@ -180,7 +179,7 @@ CREATE TABLE metrics ( SELECT create_hypertable('metrics', 'timestamp'); CREATE INDEX ON metrics (name, timestamp); --- UPSERT behavior in TimescaleDB +-- UPSERT behavior in Timescale INSERT INTO metrics (timestamp, name, description, unit, id, value) VALUES (...) ON CONFLICT (timestamp, name) DO UPDATE @@ -189,7 +188,7 @@ SET description = EXCLUDED.description, value = EXCLUDED.value; ``` -```sql +```questdb-sql title="Create sample table with deduplication/upsert for DuckDB" -- DuckDB CREATE TABLE metrics ( timestamp TIMESTAMP NOT NULL, @@ -199,10 +198,19 @@ CREATE TABLE metrics ( id UUID PRIMARY KEY, value DOUBLE NOT NULL ); + CREATE INDEX ON metrics (name, timestamp); + +-- UPSERT behavior in DuckDB +INSERT INTO metrics (timestamp, name, description, unit, id, value) +VALUES (?, ?, ?, ?, ?, ?) +ON CONFLICT (timestamp, name) DO UPDATE +SET description = EXCLUDED.description, + unit = EXCLUDED.unit, + value = EXCLUDED.value; ``` -```sql +```questdb-sql title="Create sample table with eventual upserts for ClickHouse" -- ClickHouse CREATE TABLE metrics ( timestamp DateTime, @@ -211,10 +219,12 @@ CREATE TABLE metrics ( unit String, id UUID, value Float64 -) ENGINE = MergeTree() +) ENGINE = ReplacingMergeTree ORDER BY (name, timestamp); +``` --- InfluxDB Measurement +```questdb-sql title="Create sample measure (table) for InfluxDB" +-- InfluxDB measurement measurement: metrics name (tag) description (tag) @@ -223,8 +233,8 @@ id (tag) value (field) ``` -```questdb-sql --- QuestDB Equivalent +```questdb-sql title="Create sample table with deduplication/upsert for QuestDB" +-- QuestDB equivalent CREATE TABLE metrics ( timestamp TIMESTAMP, -- Explicit timestamp for time-series queries name SYMBOL CAPACITY 50000, -- Optimized for high-cardinality categorical values @@ -237,5 +247,3 @@ PARTITION BY DAY WAL DEDUP UPSERT KEYS(timestamp, name); ``` - - From 2d74970a17304a1389609b218952c312500361fe Mon Sep 17 00:00:00 2001 From: javier Date: Mon, 3 Mar 2025 13:34:06 +0100 Subject: [PATCH 09/13] abstract indentation --- documentation/guides/schema-design-essentials.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md index 5c4aa924b..57548de3f 100644 --- a/documentation/guides/schema-design-essentials.md +++ b/documentation/guides/schema-design-essentials.md @@ -1,9 +1,8 @@ --- - title: Schema design essentials slug: schema-design-essentials description: -Learn how to design efficient schemas in QuestDB. This guide covers best practices for partitioning, indexing, symbols, timestamps, deduplication, retention strategies, and schema modifications to optimize performance in time-series workloads. + Learn how to design efficient schemas in QuestDB. This guide covers best practices for partitioning, indexing, symbols, timestamps, deduplication, retention strategies, and schema modifications to optimize performance in time-series workloads --- This guide covers key concepts and best practices to take full advantage of QuestDB’s performance-oriented architecture, highlighting some important differences with most databases. From d76d1d6b26ea940f897dc9cda3153dfbf3d5c96d Mon Sep 17 00:00:00 2001 From: javier Date: Mon, 3 Mar 2025 17:42:29 +0100 Subject: [PATCH 10/13] adding links to many internal pages --- .../guides/schema-design-essentials.md | 44 +++++++++---------- 1 file changed, 22 insertions(+), 22 deletions(-) diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md index 57548de3f..c6f6d5b38 100644 --- a/documentation/guides/schema-design-essentials.md +++ b/documentation/guides/schema-design-essentials.md @@ -19,7 +19,7 @@ If you need **multi-tenancy**, you must manage table names manually, often by us ## PostgreSQL protocol compatibility -QuestDB is **not** a PostgreSQL database but is **compatible with the PostgreSQL wire protocol**. This means you can connect using PostgreSQL-compatible libraries and clients and execute SQL commands. However, compatibility with PostgreSQL system catalogs, metadata queries, data types, and functions is limited. +QuestDB is **not** a PostgreSQL database but is **compatible with the [PostgreSQL wire protocol](/docs/reference/api/postgres/)**. This means you can connect using PostgreSQL-compatible libraries and clients and execute SQL commands. However, compatibility with PostgreSQL system catalogs, metadata queries, data types, and functions is limited. While most PostgreSQL-compatible low-level libraries work with QuestDB, some higher-level components that depend heavily on PostgreSQL metadata might fail. If you encounter such a case, please report it as an [issue on GitHub](https://github.com/questdb/questdb/issues) so we can track it. @@ -29,20 +29,20 @@ While most PostgreSQL-compatible low-level libraries work with QuestDB, some hig The easiest way to create a schema is through the **[Web Console](/docs/web-console/)** or by sending SQL commands using: -- The **REST API** (`CREATE TABLE` statements) -- The **PostgreSQL wire protocol clients** +- The [**REST API**](/docs/reference/api/rest/) (`CREATE TABLE` statements) +- The **[PostgreSQL wire protocol](/docs/reference/api/postgres/) clients** ### Schema auto-creation with ILP protocol -When using the **Influx Line Protocol (ILP)**, QuestDB automatically creates tables and columns based on incoming data. This is useful for users migrating from InfluxDB or using tools like **InfluxDB client libraries or Telegraf**, as they can send data directly to QuestDB without pre-defining schemas. However, this comes with limitations: +When using the **[Influx Line Protocol](/docs/reference/api/ilp/overview/) (ILP)**, QuestDB automatically creates tables and columns based on incoming data. This is useful for users migrating from InfluxDB or using tools like **InfluxDB client libraries or Telegraf**, as they can send data directly to QuestDB without pre-defining schemas. However, this comes with limitations: -- QuestDB applies **default settings** to auto-created tables and columns (e.g., partitioning, symbol capacity, and data types). -- Users **cannot modify partitioning or symbol capacity later**, so they should create tables explicitly beforehand. -- Auto-creation can be disabled via configuration. +- QuestDB applies **[default settings](/docs/configuration/)** to auto-created tables and columns (e.g., partitioning, symbol capacity, and data types). +- Users **cannot modify [partitioning](/docs/concept/partitions/) or [symbol capacity](/docs/concept/symbol/#usage-of-symbols) later**, so they should create tables explicitly beforehand. +- Auto-creation can be [disabled via configuration](/docs/configuration/#influxdb-line-protocol-ilp). ## The designated timestamp and partitioning strategy -QuestDB is designed for time-series workloads. The database engine is optimized to perform exceptionally well for time-series queries. One of the most important optimizations in QuestDB is that data is physically stored ordered by incremental timestamp. The user must choose the **designated timestamp** when creating a table. +QuestDB is designed for time-series workloads. The database engine is optimized to perform exceptionally well for time-series queries. One of the most important optimizations in QuestDB is that data is physically stored ordered by incremental timestamp. The user must choose the **[designated timestamp](/docs/concept/designated-timestamp/)** when creating a table. The **designated timestamp** is crucial in QuestDB. It directly affects: @@ -53,7 +53,7 @@ The **designated timestamp** is crucial in QuestDB. It directly affects: ### Partitioning guidelines -When choosing the partition resolution for your tables, consider the time ranges you will query most frequently and keep in mind the following: +When choosing the [partition](/docs/concept/partitions/) resolution for your tables, consider the time ranges you will query most frequently and keep in mind the following: - **Avoid very large partitions**: A partition should be at most **a few gigabytes**. - **Avoid too many small partitions**: Querying more partitions means opening more files. @@ -62,7 +62,7 @@ When choosing the partition resolution for your tables, consider the time ranges ## Columnar storage model and table density -QuestDB is **columnar**, meaning: +QuestDB is **[columnar](/glossary/columnar-database/)**, meaning: - **Columns are stored separately**, allowing fast queries on specific columns without loading unnecessary data. - **Each column is stored in one or two files per partition**: The more columns you include in a `SELECT` and the more partitions the query spans, the more files will need to be opened and cached into working memory. @@ -70,7 +70,7 @@ QuestDB is **columnar**, meaning: ### Sparse vs. dense tables - **QuestDB handles wide tables efficiently** due to its columnar architecture, as it will open only the column files referenced in each query. -- **Null values take storage space**, so it is recommended to avoid sparse tables where possible. +- **Null values take [storage space](/docs/reference/sql/datatypes/#type-nullability)**, so it is recommended to avoid sparse tables where possible. - **Dense tables** (where most columns have values) are more efficient in terms of storage and query performance. If you cannot design a dense table, consider creating different tables for distinct record structures. @@ -78,7 +78,7 @@ QuestDB is **columnar**, meaning: ### Symbols (recommended for categorical data) -QuestDB introduces a specialized `SYMBOL` data type. Symbols are **dictionary-encoded** and optimized for filtering and grouping: +QuestDB introduces a specialized [`SYMBOL`](/docs/concept/symbol) data type. Symbols are **dictionary-encoded** and optimized for filtering and grouping: - Use symbols for **categorical data** with a limited number of unique values (e.g., country codes, stock tickers, factory floor IDs). - Symbols are fine for **storing up to a few million distinct values** but should be avoided beyond that. @@ -96,26 +96,26 @@ QuestDB introduces a specialized `SYMBOL` data type. Symbols are **dictionary-en ### Strings vs. varchar -- Avoid **`STRING`**: It is a legacy data type. +- Avoid **[`STRING`](/docs/reference/sql/datatypes/#varchar-and-string-considerations)**: It is a legacy data type. - Use **`VARCHAR`** instead for general string storage. ### UUIDs -- QuestDB has a dedicated **`UUID`** type, which is more efficient than storing UUIDs as `VARCHAR`. +- QuestDB has a dedicated **[`UUID`](/blog/uuid-coordination-free-unique-keys/)** type, which is more efficient than storing UUIDs as `VARCHAR`. ### Other data types - **Booleans**: `true`/`false` values are supported. - **Bytes**: `BYTES` type allows storing raw binary data. - **IPv4**: QuestDB has a dedicated `IPv4` type for optimized IP storage and filtering. -- **Several numeric datatypes** are supported. -- **Geo**: QuestDB provides spatial support via geohashes. +- **Several [numeric datatypes](/docs/reference/sql/datatypes)** are supported. +- **Geo**: QuestDB provides [spatial support via geohashes](/docs/concept/geohashes/). ## Referential integrity, constraints, and deduplication - QuestDB **does not enforce** `PRIMARY KEYS`, `FOREIGN KEYS`, or **`NOT NULL`** constraints. -- **Joins between tables work even without referential integrity**, as long as the data types on the join condition are compatible. -- **Duplicate data is allowed by default**, but `UPSERT KEYS` can be defined to **ensure uniqueness**. +- **Joins between tables work even without referential integrity**, as long as the data types on the [join condition](/docs/reference/sql/join/) are compatible. +- **[Duplicate data](/docs/concept/deduplication/) is allowed by default**, but `UPSERT KEYS` can be defined to **ensure uniqueness**. - **Deduplication in QuestDB happens on an exact timestamp and optionally a set of other columns (`UPSERT KEYS`)**. - **Deduplication has no noticeable performance penalty**. @@ -123,8 +123,8 @@ QuestDB introduces a specialized `SYMBOL` data type. Symbols are **dictionary-en Since **individual row deletions are not supported**, data retention is managed via: -- **Setting a TTL retention period** per table to control partition expiration. -- **Materialized views**: QuestDB **automatically refreshes** materialized views, storing aggregated data at lower granularity. You can also apply TTL expiration on the base table. +- **Setting a [TTL retention](/docs/concept/ttl) period** per table to control partition expiration. +- **Materialized views**: QuestDB **automatically refreshes** [materialized views](/reference/sql/create-mat-view/), storing aggregated data at lower granularity. You can also apply TTL expiration on the base table. ## Schema decisions that cannot be easily changed @@ -137,9 +137,9 @@ Some table properties **cannot be modified after creation**, including: For changes, the typical workaround is: 1. Create a **new column** with the updated configuration. -2. Copy data from the old column into the new one. +2. [Copy data](/reference/sql/update/) from the old column into the new one. 3. Drop the old column and rename the new one. -4. **If changes affect table-wide properties** (e.g., partitioning, timestamp column, or WAL settings), create a new table with the required properties, insert data from the old table, drop the old table, and rename the new table. +4. **If changes affect table-wide properties** (e.g., partitioning, timestamp column, or WAL settings), create a new table with the required properties, [insert data from the old table](/reference/sql/insert/#inserting-query-results), drop the old table, and rename the new table. ## Examples of schema translations from other databases From b7f47f2b628efa11b4b6705b65d502eb25e9f5f7 Mon Sep 17 00:00:00 2001 From: goodroot <9484709+goodroot@users.noreply.github.com> Date: Mon, 3 Mar 2025 10:07:18 -0800 Subject: [PATCH 11/13] edit & stub --- .../guides/schema-design-essentials.md | 91 +++++++++++++++++-- 1 file changed, 83 insertions(+), 8 deletions(-) diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md index c6f6d5b38..a14c43183 100644 --- a/documentation/guides/schema-design-essentials.md +++ b/documentation/guides/schema-design-essentials.md @@ -5,18 +5,94 @@ description: Learn how to design efficient schemas in QuestDB. This guide covers best practices for partitioning, indexing, symbols, timestamps, deduplication, retention strategies, and schema modifications to optimize performance in time-series workloads --- -This guide covers key concepts and best practices to take full advantage of QuestDB’s performance-oriented architecture, highlighting some important differences with most databases. +This guide covers key concepts and best practices to take full advantage of QuestDB's performance-oriented architecture, highlighting some important differences with most databases. -## QuestDB’s single database model +## QuestDB's single database model QuestDB has a **single database per instance**. Unlike PostgreSQL and other database engines, where you may have multiple databases or multiple schemas within an instance, in QuestDB, you operate within a single namespace. -The default database is named `qdb`, and this can be changed via configuration. However, unlike a standard SQL database, there is no need to issue `USE DATABASE` commands—once connected, you can immediately start querying and inserting data. +The default database is named `qdb`, and this can be changed via configuration. However, unlike a standard SQL database, there is no need to issue `USE DATABASE` commands. Once connected, you can immediately start querying and inserting data. ### Multi-tenancy considerations If you need **multi-tenancy**, you must manage table names manually, often by using **prefixes** for different datasets. Since QuestDB does not support multiple schemas, this is the primary way to segment data. In QuestDB Enterprise, you can [enforce permissions per table to restrict access](/docs/operations/rbac/), allowing finer control over multi-tenant environments. +Here are common patterns for implementing multi-tenancy: + +#### Customer-specific tables + +```questdb-sql +-- Customer-specific trading data +CREATE TABLE customer1_trades ( + timestamp TIMESTAMP, + symbol SYMBOL, + price DOUBLE +) TIMESTAMP(timestamp) PARTITION BY DAY; + +CREATE TABLE customer2_trades ( + timestamp TIMESTAMP, + symbol SYMBOL, + price DOUBLE +) TIMESTAMP(timestamp) PARTITION BY DAY; +``` + +#### Environment or region-based separation + +```questdb-sql +-- Production vs. Development environments +CREATE TABLE prod_metrics ( + timestamp TIMESTAMP, + metric_name SYMBOL, + value DOUBLE +) TIMESTAMP(timestamp); + +CREATE TABLE dev_metrics ( + timestamp TIMESTAMP, + metric_name SYMBOL, + value DOUBLE +) TIMESTAMP(timestamp); + +-- Regional data separation +CREATE TABLE eu_users ( + timestamp TIMESTAMP, + user_id SYMBOL, + action SYMBOL +) TIMESTAMP(timestamp); + +CREATE TABLE us_users ( + timestamp TIMESTAMP, + user_id SYMBOL, + action SYMBOL +) TIMESTAMP(timestamp); +``` + +#### Department or team-based separation + +```questdb-sql +-- Department-specific analytics +CREATE TABLE sales_daily_stats ( + timestamp TIMESTAMP, + region SYMBOL, + revenue DOUBLE +) TIMESTAMP(timestamp) PARTITION BY DAY; + +CREATE TABLE marketing_campaign_metrics ( + timestamp TIMESTAMP, + campaign_id SYMBOL, + clicks LONG, + impressions LONG +) TIMESTAMP(timestamp) PARTITION BY DAY; +``` + +:::tip + +When using table prefixes for multi-tenancy: +- Use consistent naming conventions (e.g., always `_`) +- Consider using uppercase for tenant identifiers to improve readability +- Document your naming convention in your team's schema design guidelines + +::: + ## PostgreSQL protocol compatibility QuestDB is **not** a PostgreSQL database but is **compatible with the [PostgreSQL wire protocol](/docs/reference/api/postgres/)**. This means you can connect using PostgreSQL-compatible libraries and clients and execute SQL commands. However, compatibility with PostgreSQL system catalogs, metadata queries, data types, and functions is limited. @@ -36,13 +112,13 @@ The easiest way to create a schema is through the **[Web Console](/docs/web-cons When using the **[Influx Line Protocol](/docs/reference/api/ilp/overview/) (ILP)**, QuestDB automatically creates tables and columns based on incoming data. This is useful for users migrating from InfluxDB or using tools like **InfluxDB client libraries or Telegraf**, as they can send data directly to QuestDB without pre-defining schemas. However, this comes with limitations: -- QuestDB applies **[default settings](/docs/configuration/)** to auto-created tables and columns (e.g., partitioning, symbol capacity, and data types). +- QuestDB applies **default settings** to auto-created tables and columns (e.g., partitioning, symbol capacity, and data types). - Users **cannot modify [partitioning](/docs/concept/partitions/) or [symbol capacity](/docs/concept/symbol/#usage-of-symbols) later**, so they should create tables explicitly beforehand. - Auto-creation can be [disabled via configuration](/docs/configuration/#influxdb-line-protocol-ilp). ## The designated timestamp and partitioning strategy -QuestDB is designed for time-series workloads. The database engine is optimized to perform exceptionally well for time-series queries. One of the most important optimizations in QuestDB is that data is physically stored ordered by incremental timestamp. The user must choose the **[designated timestamp](/docs/concept/designated-timestamp/)** when creating a table. +QuestDB is designed for time-series workloads. The database engine is optimized to perform exceptionally well for time-series queries. One of the most important optimizations in QuestDB is that data is physically stored and ordered by incremental timestamp. Therefore, the user must choose the **[designated timestamp](/docs/concept/designated-timestamp/)** when creating a table. The **designated timestamp** is crucial in QuestDB. It directly affects: @@ -73,7 +149,6 @@ QuestDB is **[columnar](/glossary/columnar-database/)**, meaning: - **Null values take [storage space](/docs/reference/sql/datatypes/#type-nullability)**, so it is recommended to avoid sparse tables where possible. - **Dense tables** (where most columns have values) are more efficient in terms of storage and query performance. If you cannot design a dense table, consider creating different tables for distinct record structures. - ## Data types and best practices ### Symbols (recommended for categorical data) @@ -119,13 +194,14 @@ QuestDB introduces a specialized [`SYMBOL`](/docs/concept/symbol) data type. Sym - **Deduplication in QuestDB happens on an exact timestamp and optionally a set of other columns (`UPSERT KEYS`)**. - **Deduplication has no noticeable performance penalty**. + ## Schema decisions that cannot be easily changed Some table properties **cannot be modified after creation**, including: @@ -245,4 +321,3 @@ CREATE TABLE metrics ( PARTITION BY DAY WAL DEDUP UPSERT KEYS(timestamp, name); ``` - From f0db0ac0fcef8ffbea1fb38ff0881f38e832a6e8 Mon Sep 17 00:00:00 2001 From: goodroot <9484709+goodroot@users.noreply.github.com> Date: Mon, 3 Mar 2025 10:08:28 -0800 Subject: [PATCH 12/13] polish --- documentation/guides/schema-design-essentials.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md index a14c43183..61e0cd1a5 100644 --- a/documentation/guides/schema-design-essentials.md +++ b/documentation/guides/schema-design-essentials.md @@ -1,5 +1,5 @@ --- -title: Schema design essentials +title: Schema Design Essentials slug: schema-design-essentials description: Learn how to design efficient schemas in QuestDB. This guide covers best practices for partitioning, indexing, symbols, timestamps, deduplication, retention strategies, and schema modifications to optimize performance in time-series workloads From 030eafa615a616e7a71896fd741a657fa5da542f Mon Sep 17 00:00:00 2001 From: goodroot <9484709+goodroot@users.noreply.github.com> Date: Mon, 3 Mar 2025 10:12:38 -0800 Subject: [PATCH 13/13] add schema design content --- documentation/guides/schema-design-essentials.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/documentation/guides/schema-design-essentials.md b/documentation/guides/schema-design-essentials.md index 61e0cd1a5..734ffe2d4 100644 --- a/documentation/guides/schema-design-essentials.md +++ b/documentation/guides/schema-design-essentials.md @@ -194,14 +194,13 @@ QuestDB introduces a specialized [`SYMBOL`](/docs/concept/symbol) data type. Sym - **Deduplication in QuestDB happens on an exact timestamp and optionally a set of other columns (`UPSERT KEYS`)**. - **Deduplication has no noticeable performance penalty**. - + ## Schema decisions that cannot be easily changed Some table properties **cannot be modified after creation**, including: