Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions docs/about-us/distinctive-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,15 @@ keywords: ['compression', 'secondary-indexes','column-oriented']

In a real column-oriented DBMS, no extra data is stored with the values. This means that constant-length values must be supported to avoid storing their length "number" next to the values. For example, a billion UInt8-type values should consume around 1 GB uncompressed, or this strongly affects the CPU use. It is essential to store data compactly (without any "garbage") even when uncompressed since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.

This is in contrast to systems that can store values of different columns separately, but that cannot effectively process analytical queries due to their optimization for other scenarios, such as HBase, Bigtable, Cassandra, and Hypertable. You would get throughput around a hundred thousand rows per second in these systems, but not hundreds of millions of rows per second.
This is in contrast to systems that can store values of different columns separately, but that cannot effectively process analytical queries due to their optimization for other scenarios, such as HBase, Bigtable, Cassandra, and Hypertable. You would get throughput of around a hundred thousand rows per second in these systems, but not hundreds of millions of rows per second.

Finally, ClickHouse is a database management system, not a single database. It allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.

## Data compression {#data-compression}

Some column-oriented DBMSs do not use data compression. However, data compression plays a key role in achieving excellent performance.

In addition to efficient general-purpose compression codecs with different trade-offs between disk space and CPU consumption, ClickHouse provides [specialized codecs](/sql-reference/statements/create/table.md#specialized-codecs) for specific kinds of data, which allow ClickHouse to compete with and outperform more niche databases, like time-series ones.
In addition to efficient general-purpose compression codecs with different trade-offs between disk space and CPU consumption, ClickHouse provides [specialized codecs](/sql-reference/statements/create/table.md#specialized-codecs) for specific kinds of data, which allows ClickHouse to compete with and outperform more niche databases, like time-series ones.

## Disk storage of data {#disk-storage-of-data}

Expand All @@ -41,9 +41,9 @@ In ClickHouse, data can reside on different shards. Each shard can be a group of

## SQL support {#sql-support}

ClickHouse supports [SQL language](/sql-reference/) that is mostly compatible with the ANSI SQL standard.
ClickHouse supports [a declarative query language](/sql-reference/) based on SQL that is mostly compatible with the ANSI SQL standard.

Supported queries include [GROUP BY](../sql-reference/statements/select/group-by.md), [ORDER BY](../sql-reference/statements/select/order-by.md), subqueries in [FROM](../sql-reference/statements/select/from.md), [JOIN](../sql-reference/statements/select/join.md) clause, [IN](../sql-reference/operators/in.md) operator, [window functions](../sql-reference/window-functions/index.md) and scalar subqueries.
Supported queries include [GROUP BY](../sql-reference/statements/select/group-by.md), [ORDER BY](../sql-reference/statements/select/order-by.md), subqueries in [FROM](../sql-reference/statements/select/from.md), the [JOIN](../sql-reference/statements/select/join.md) clause, the [IN](../sql-reference/operators/in.md) operator, [window functions](../sql-reference/window-functions/index.md) and scalar subqueries.

Correlated (dependent) subqueries are not supported at the time of writing but might become available in the future.

Expand All @@ -67,7 +67,7 @@ Unlike other database management systems, secondary indexes in ClickHouse do not

Most OLAP database management systems do not aim for online queries with sub-second latencies. In alternative systems, report building time of tens of seconds or even minutes is often considered acceptable. Sometimes it takes even more time, which forces systems to prepare reports offline (in advance or by responding with "come back later").

In ClickHouse "low latency" means that queries can be processed without delay and without trying to prepare an answer in advance, right at the same moment as the user interface page is loading. In other words, online.
In ClickHouse, "low latency" means that queries can be processed without delay and without trying to prepare an answer in advance, right at the moment when the user interface page is loading — in other words, *online*.

## Support for approximated calculations {#support-for-approximated-calculations}

Expand All @@ -79,7 +79,7 @@ ClickHouse provides various ways to trade accuracy for performance:

## Adaptive join algorithm {#adaptive-join-algorithm}

ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash-join algorithm and falling back to the merge-join algorithm if there's more than one large table.
ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash join and falling back to merge join if there's more than one large table.

## Data replication and data integrity support {#data-replication-and-data-integrity-support}

Expand All @@ -89,7 +89,7 @@ For more information, see the section [Data replication](../engines/table-engine

## Role-Based Access Control {#role-based-access-control}

ClickHouse implements user account management using SQL queries and allows for [role-based access control configuration](/guides/sre/user-management/index.md) similar to what can be found in ANSI SQL standard and popular relational database management systems.
ClickHouse implements user account management using SQL queries and allows for [role-based access control configuration](/guides/sre/user-management/index.md) similar to what can be found in the ANSI SQL standard and popular relational database management systems.

## Features that can be considered disadvantages {#clickhouse-features-that-can-be-considered-disadvantages}

Expand Down
12 changes: 6 additions & 6 deletions docs/concepts/olap.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,17 @@ keywords: ['OLAP']

[OLAP](https://en.wikipedia.org/wiki/Online_analytical_processing) stands for Online Analytical Processing. It is a broad term that can be looked at from two perspectives: technical and business. At the highest level, you can just read these words backward:

**Processing** some source data is processed…
**Processing** — Some source data is processed…

**Analytical** …to produce some analytical reports and insights…
**Analytical** …to produce some analytical reports and insights…

**Online** …in real-time.
**Online** …in real-time.

## OLAP from the business perspective {#olap-from-the-business-perspective}

In recent years business people started to realize the value of data. Companies who make their decisions blindly more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be even remotely useful for making business decisions, and imposes on them a need for mechanisms which allow them to analyze this data in a timely manner. Here's where OLAP database management systems (DBMS) come in.
In recent years business people have started to realize the value of data. Companies who make their decisions blindly more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be even remotely useful for making business decisions, and imposes on them a need for mechanisms which allow them to analyze this data in a timely manner. Here's where OLAP database management systems (DBMS) come in.

In a business sense, OLAP allows companies to continuously plan, analyze, and report operational activities, thus maximizing efficiency, reducing expenses, and ultimately conquering the market share. It could be done either in an in-house system or outsourced to SaaS providers like web/mobile analytics services, CRM services, etc. OLAP is the technology behind many BI applications (Business Intelligence).
In a business sense, OLAP allows companies to continuously plan, analyze, and report operational activities, thus maximizing efficiency, reducing expenses, and ultimately conquering the market share. It could be done either in an in-house system or outsourced to SaaS providers like web/mobile analytics services, CRM services, etc. OLAP is the technology behind many BI (business intelligence) applications.

ClickHouse is an OLAP database management system that is pretty often used as a backend for those SaaS solutions for analyzing domain-specific data. However, some businesses are still reluctant to share their data with third-party providers and so an in-house data warehouse scenario is also viable.

Expand All @@ -35,5 +35,5 @@ Even if a DBMS started out as a pure OLAP or pure OLTP, it is forced to move in

The fundamental trade-off between OLAP and OLTP systems remains:

- To build analytical reports efficiently it's crucial to be able to read columns separately, thus most OLAP databases are [columnar](https://clickhouse.com/engineering-resources/what-is-columnar-database),
- To build analytical reports efficiently it's crucial to be able to read columns separately, thus most OLAP databases are [columnar](https://clickhouse.com/engineering-resources/what-is-columnar-database);
- While storing columns separately increases costs of operations on rows, like append or in-place modification, proportionally to the number of columns (which can be huge if the systems try to collect all details of an event just in case). Thus, most OLTP systems store data arranged by rows.
4 changes: 2 additions & 2 deletions docs/concepts/why-clickhouse-is-so-fast.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -135,12 +135,12 @@ Algorithms that rely on data characteristics often perform better than their gen
## VLDB 2024 paper {#vldb-2024-paper}

In August 2024, we had our first research paper accepted and published at VLDB.
VLDB in an international conference on very large databases, and is widely regarded as one of the leading conferences in the field of data management.
VLDB is an international conference on very large databases, and is widely regarded as one of the leading conferences in the field of data management.
Among the hundreds of submissions, VLDB generally has an acceptance rate of ~20%.

You can read a [PDF of the paper](https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf) or our [web version](/docs/academic_overview) of it, which gives a concise description of ClickHouse's most interesting architectural and system design components that make it so fast.

Alexey Milovidov, our CTO and the creator of ClickHouse, presented the paper (slides [here](https://raw.githubusercontent.com/ClickHouse/clickhouse-presentations/master/2024-vldb/VLDB_2024_presentation.pdf)), followed by a Q&A (that quickly ran out of time!).
You can catch the recorded presentation here:

<iframe width="1024" height="576" src="https://www.youtube.com/embed/7QXKBKDOkJE?si=5uFerjqPSXQWqDkF" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
<iframe width="1024" height="576" src="https://www.youtube.com/embed/7QXKBKDOkJE?si=5uFerjqPSXQWqDkF" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
2 changes: 1 addition & 1 deletion docs/deployment-modes.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ The combination of remote table functions and access to the local file system ma

## chDB {#chdb}

[chDB](/chdb) is ClickHouse embedded as an in-process database engine,, with Python being the primary implementation, though it's also available for Go, Rust, NodeJS, and Bun. This deployment option brings ClickHouse's powerful OLAP capabilities directly into your application's process, eliminating the need for a separate database installation.
[chDB](/chdb) is ClickHouse embedded as an in-process database engine, with Python being the primary implementation, though it's also available for Go, Rust, NodeJS, and Bun. This deployment option brings ClickHouse's powerful OLAP capabilities directly into your application's process, eliminating the need for a separate database installation.

<Image img={chDB} alt="chDB - Embedded ClickHouse" size="sm"/>

Expand Down
4 changes: 2 additions & 2 deletions docs/getting-started/install/_snippets/_macos.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ This should output something like:
/opt/homebrew/bin/clickhouse
```

Remove `clickhouse` from the quarantine bin by running `xattr -d com.apple.quarantine` following by the path from the previous command:
Remove `clickhouse` from the quarantine bin by running `xattr -d com.apple.quarantine` followed by the path from the previous command:

```shell
xattr -d com.apple.quarantine /opt/homebrew/bin/clickhouse
Expand All @@ -81,7 +81,7 @@ Use one of the following commands:
clickhouse local [args]
clickhouse client [args]
clickhouse benchmark [args]
...
```

## Fix the issue by reinstalling ClickHouse {#fix-issue}

Expand Down
6 changes: 3 additions & 3 deletions docs/getting-started/install/_snippets/_quick_install.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@
local-host :)
```

Table data is stored in the current directory and still available after a restart
of ClickHouse server. If necessary, you can pass
Table data is stored in the current directory and will still be available after a restart

Check warning on line 63 in docs/getting-started/install/_snippets/_quick_install.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'will still', use present tense.
of the ClickHouse server. If necessary, you can pass
`-C config.xml` as an additional command line argument to `./clickhouse server`
and provide further configuration in a configuration
file. All available configuration settings are documented [here](/operations/server-configuration-parameters/settings) and in the
Expand All @@ -71,7 +71,7 @@
You are now ready to start sending SQL commands to ClickHouse!

:::tip
The [Quick Start](/get-started/quick-start) walks through the steps for creating tables and inserting data.
The [Quick Start](/get-started/quick-start) walks you through the steps for creating tables and inserting data.
:::

</VerticalStepper>
10 changes: 5 additions & 5 deletions docs/getting-started/install/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,14 @@

:::note
Since ClickHouse's CI is evolving over time, the exact steps to download CI-generated builds may vary.
Also, CI may delete too old build artifacts, making them unavailable for download.
Also, CI may delete old build artifacts, making them unavailable for download.
:::

For example, to download a aarch64 binary for ClickHouse v23.4, follow these steps:
For example, to download an aarch64 binary for ClickHouse v23.4, follow these steps:

- Find the GitHub pull request for release v23.4: [Release pull request for branch 23.4](https://github.com/ClickHouse/ClickHouse/pull/49238)
- Click "Commits", then click a commit similar to "Update autogenerated version to 23.4.2.1 and contributors" for the particular version you like to install.
- Click "Commits", then click on a commit similar to "Update autogenerated version to 23.4.2.1 and contributors" for the particular version you'd like to install.

Check warning on line 45 in docs/getting-started/install/advanced.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.Quotes

Commas and periods go inside quotation marks.
- Click the green check / yellow dot / red cross to open the list of CI checks.
- Click "Details" next to "Builds" in the list, it will open a page similar to [this page](https://s3.amazonaws.com/clickhouse-test-reports/46793/b460eb70bf29b19eadd19a1f959b15d186705394/clickhouse_build_check/report.html)
- Find the rows with compiler = "clang-*-aarch64" - there are multiple rows.
- Click "Details" next to "Builds" in the list; it will open a page similar to [this page](https://s3.amazonaws.com/clickhouse-test-reports/46793/b460eb70bf29b19eadd19a1f959b15d186705394/clickhouse_build_check/report.html).

Check warning on line 47 in docs/getting-started/install/advanced.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'will open', use present tense.
- Find the rows with compiler = "clang-*-aarch64" there are multiple rows.
- Download the artifacts for these builds.
2 changes: 1 addition & 1 deletion docs/guides/developer/mutations.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@

## Lightweight deletes {#lightweight-deletes}

Another option for deleting rows it to use the `DELETE FROM` command, which is referred to as a **lightweight delete**. The deleted rows are marked as deleted immediately and will be automatically filtered out of all subsequent queries, so you do not have to wait for a merging of parts or use the `FINAL` keyword. Cleanup of data happens asynchronously in the background.
Another option for deleting rows is to use the `DELETE FROM` command, which is referred to as a **lightweight delete**. The deleted rows are marked as deleted immediately and will be automatically filtered out of all subsequent queries, so you do not have to wait for a merging of parts or use the `FINAL` keyword. Cleanup of data happens asynchronously in the background.

Check warning on line 92 in docs/guides/developer/mutations.md

View workflow job for this annotation

GitHub Actions / vale

ClickHouse.FutureTense

Instead of future tense 'will be', use present tense.

``` sql
DELETE FROM [db.]table [ON CLUSTER cluster] [WHERE expr]
Expand Down
4 changes: 2 additions & 2 deletions docs/guides/inserting-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ Unlike many traditional databases, ClickHouse supports an HTTP interface.
Users can use this for both inserting and querying data, using any of the above formats.
This is often preferable to ClickHouse's native protocol as it allows traffic to be easily switched with load balancers.
We expect small differences in insert performance with the native protocol, which incurs a little less overhead.
Existing clients use either of these protocols ( in some cases both e.g. the Go client).
Existing clients use either of these protocols (in some cases both e.g. the Go client).
The native protocol does allow query progress to be easily tracked.

See [HTTP Interface](/interfaces/http) for further details.
Expand All @@ -149,7 +149,7 @@ For loading data from Postgres, users can use:
- `PeerDB by ClickHouse`, an ETL tool specifically designed for PostgreSQL database replication. This is available in both:
- ClickHouse Cloud - available through our [new connector](/integrations/clickpipes/postgres) in ClickPipes, our managed ingestion service.
- Self-managed - via the [open-source project](https://github.com/PeerDB-io/peerdb).
- The [PostgreSQL table engine](/integrations/postgresql#using-the-postgresql-table-engine) to read data directly as shown in previous examples. Typically appropriate if batch replication based on a known watermark, e.g., timestamp, is sufficient or if it's a one-off migration. This approach can scale to 10's millions of rows. Users looking to migrate larger datasets should consider multiple requests, each dealing with a chunk of the data. Staging tables can be used for each chunk prior to its partitions being moved to a final table. This allows failed requests to be retried. For further details on this bulk-loading strategy, see here.
- The [PostgreSQL table engine](/integrations/postgresql#using-the-postgresql-table-engine) to read data directly as shown in previous examples. Typically appropriate if batch replication based on a known watermark, e.g., timestamp, is sufficient or if it's a one-off migration. This approach can scale to 10's of millions of rows. Users looking to migrate larger datasets should consider multiple requests, each dealing with a chunk of the data. Staging tables can be used for each chunk prior to its partitions being moved to a final table. This allows failed requests to be retried. For further details on this bulk-loading strategy, see here.
- Data can be exported from PostgreSQL in CSV format. This can then be inserted into ClickHouse from either local files or via object storage using table functions.

:::note Need help inserting large datasets?
Expand Down
Loading