ClickHouse · Blargian · Aug 24, 2025 · Aug 17, 2025 · Aug 17, 2025 · Aug 17, 2025
@@ -13,15 +13,15 @@ keywords: ['compression', 'secondary-indexes','column-oriented']
 
 In a real column-oriented DBMS, no extra data is stored with the values. This means that constant-length values must be supported to avoid storing their length "number" next to the values. For example, a billion UInt8-type values should consume around 1 GB uncompressed, or this strongly affects the CPU use. It is essential to store data compactly (without any "garbage") even when uncompressed since the speed of decompression (CPU usage) depends mainly on the volume of uncompressed data.
 
-This is in contrast to systems that can store values of different columns separately, but that cannot effectively process analytical queries due to their optimization for other scenarios, such as HBase, Bigtable, Cassandra, and Hypertable. You would get throughput around a hundred thousand rows per second in these systems, but not hundreds of millions of rows per second.
+This is in contrast to systems that can store values of different columns separately, but that cannot effectively process analytical queries due to their optimization for other scenarios, such as HBase, Bigtable, Cassandra, and Hypertable. You would get throughput of around a hundred thousand rows per second in these systems, but not hundreds of millions of rows per second.
 
 Finally, ClickHouse is a database management system, not a single database. It allows creating tables and databases in runtime, loading data, and running queries without reconfiguring and restarting the server.
 
 ## Data compression {#data-compression}
 
 Some column-oriented DBMSs do not use data compression. However, data compression plays a key role in achieving excellent performance.
 
-In addition to efficient general-purpose compression codecs with different trade-offs between disk space and CPU consumption, ClickHouse provides [specialized codecs](/sql-reference/statements/create/table.md#specialized-codecs) for specific kinds of data, which allow ClickHouse to compete with and outperform more niche databases, like time-series ones.
+In addition to efficient general-purpose compression codecs with different trade-offs between disk space and CPU consumption, ClickHouse provides [specialized codecs](/sql-reference/statements/create/table.md#specialized-codecs) for specific kinds of data, which allows ClickHouse to compete with and outperform more niche databases, like time-series ones.
 
 ## Disk storage of data {#disk-storage-of-data}
 
@@ -41,9 +41,9 @@ In ClickHouse, data can reside on different shards. Each shard can be a group of
 
 ## SQL support {#sql-support}
 
-ClickHouse supports [SQL language](/sql-reference/) that is mostly compatible with the ANSI SQL standard.
+ClickHouse supports [a declarative query language](/sql-reference/) based on SQL that is mostly compatible with the ANSI SQL standard.
 
-Supported queries include [GROUP BY](../sql-reference/statements/select/group-by.md), [ORDER BY](../sql-reference/statements/select/order-by.md), subqueries in [FROM](../sql-reference/statements/select/from.md), [JOIN](../sql-reference/statements/select/join.md) clause, [IN](../sql-reference/operators/in.md) operator, [window functions](../sql-reference/window-functions/index.md) and scalar subqueries.
+Supported queries include [GROUP BY](../sql-reference/statements/select/group-by.md), [ORDER BY](../sql-reference/statements/select/order-by.md), subqueries in [FROM](../sql-reference/statements/select/from.md), the [JOIN](../sql-reference/statements/select/join.md) clause, the [IN](../sql-reference/operators/in.md) operator, [window functions](../sql-reference/window-functions/index.md) and scalar subqueries.
 
 Correlated (dependent) subqueries are not supported at the time of writing but might become available in the future.
 
@@ -67,7 +67,7 @@ Unlike other database management systems, secondary indexes in ClickHouse do not
 
 Most OLAP database management systems do not aim for online queries with sub-second latencies. In alternative systems, report building time of tens of seconds or even minutes is often considered acceptable. Sometimes it takes even more time, which forces systems to prepare reports offline (in advance or by responding with "come back later").
 
-In ClickHouse "low latency" means that queries can be processed without delay and without trying to prepare an answer in advance, right at the same moment as the user interface page is loading. In other words, online.
+In ClickHouse, "low latency" means that queries can be processed without delay and without trying to prepare an answer in advance, right at the moment when the user interface page is loading — in other words, *online*.
 
 ## Support for approximated calculations {#support-for-approximated-calculations}
 
@@ -79,7 +79,7 @@ ClickHouse provides various ways to trade accuracy for performance:
 
 ## Adaptive join algorithm {#adaptive-join-algorithm}
 
-ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash-join algorithm and falling back to the merge-join algorithm if there's more than one large table.
+ClickHouse adaptively chooses how to [JOIN](../sql-reference/statements/select/join.md) multiple tables, by preferring hash join and falling back to merge join if there's more than one large table.
 
 ## Data replication and data integrity support {#data-replication-and-data-integrity-support}
 
@@ -89,7 +89,7 @@ For more information, see the section [Data replication](../engines/table-engine
 
 ## Role-Based Access Control {#role-based-access-control}
 
-ClickHouse implements user account management using SQL queries and allows for [role-based access control configuration](/guides/sre/user-management/index.md) similar to what can be found in ANSI SQL standard and popular relational database management systems.
+ClickHouse implements user account management using SQL queries and allows for [role-based access control configuration](/guides/sre/user-management/index.md) similar to what can be found in the ANSI SQL standard and popular relational database management systems.
 
 ## Features that can be considered disadvantages {#clickhouse-features-that-can-be-considered-disadvantages}
 

@@ -11,17 +11,17 @@ keywords: ['OLAP']
 
 [OLAP](https://en.wikipedia.org/wiki/Online_analytical_processing) stands for Online Analytical Processing. It is a broad term that can be looked at from two perspectives: technical and business. At the highest level, you can just read these words backward:
 
-**Processing** some source data is processed…
+**Processing** — Some source data is processed…
 
-**Analytical** …to produce some analytical reports and insights…
+**Analytical** — …to produce some analytical reports and insights…
 
-**Online** …in real-time.
+**Online** — …in real-time.
 
 ## OLAP from the business perspective {#olap-from-the-business-perspective}
 
-In recent years business people started to realize the value of data. Companies who make their decisions blindly more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be even remotely useful for making business decisions, and imposes on them a need for mechanisms which allow them to analyze this data in a timely manner. Here's where OLAP database management systems (DBMS) come in.
+In recent years business people have started to realize the value of data. Companies who make their decisions blindly more often than not fail to keep up with the competition. The data-driven approach of successful companies forces them to collect all data that might be even remotely useful for making business decisions, and imposes on them a need for mechanisms which allow them to analyze this data in a timely manner. Here's where OLAP database management systems (DBMS) come in.
 
-In a business sense, OLAP allows companies to continuously plan, analyze, and report operational activities, thus maximizing efficiency, reducing expenses, and ultimately conquering the market share. It could be done either in an in-house system or outsourced to SaaS providers like web/mobile analytics services, CRM services, etc. OLAP is the technology behind many BI applications (Business Intelligence).
+In a business sense, OLAP allows companies to continuously plan, analyze, and report operational activities, thus maximizing efficiency, reducing expenses, and ultimately conquering the market share. It could be done either in an in-house system or outsourced to SaaS providers like web/mobile analytics services, CRM services, etc. OLAP is the technology behind many BI (business intelligence) applications.
 
 ClickHouse is an OLAP database management system that is pretty often used as a backend for those SaaS solutions for analyzing domain-specific data. However, some businesses are still reluctant to share their data with third-party providers and so an in-house data warehouse scenario is also viable.
 
@@ -35,5 +35,5 @@ Even if a DBMS started out as a pure OLAP or pure OLTP, it is forced to move in
 
 The fundamental trade-off between OLAP and OLTP systems remains:
 
-- To build analytical reports efficiently it's crucial to be able to read columns separately, thus most OLAP databases are [columnar](https://clickhouse.com/engineering-resources/what-is-columnar-database),
+- To build analytical reports efficiently it's crucial to be able to read columns separately, thus most OLAP databases are [columnar](https://clickhouse.com/engineering-resources/what-is-columnar-database);
 - While storing columns separately increases costs of operations on rows, like append or in-place modification, proportionally to the number of columns (which can be huge if the systems try to collect all details of an event just in case). Thus, most OLTP systems store data arranged by rows.
@@ -135,12 +135,12 @@ Algorithms that rely on data characteristics often perform better than their gen
 ## VLDB 2024 paper {#vldb-2024-paper}
 
 In August 2024, we had our first research paper accepted and published at VLDB.
-VLDB in an international conference on very large databases, and is widely regarded as one of the leading conferences in the field of data management.
+VLDB is an international conference on very large databases, and is widely regarded as one of the leading conferences in the field of data management.
 Among the hundreds of submissions, VLDB generally has an acceptance rate of ~20%.
 
 You can read a [PDF of the paper](https://www.vldb.org/pvldb/vol17/p3731-schulze.pdf) or our [web version](/docs/academic_overview) of it, which gives a concise description of ClickHouse's most interesting architectural and system design components that make it so fast.
 
 Alexey Milovidov, our CTO and the creator of ClickHouse, presented the paper (slides [here](https://raw.githubusercontent.com/ClickHouse/clickhouse-presentations/master/2024-vldb/VLDB_2024_presentation.pdf)), followed by a Q&A (that quickly ran out of time!).
 You can catch the recorded presentation here:
 
-<iframe width="1024" height="576" src="https://www.youtube.com/embed/7QXKBKDOkJE?si=5uFerjqPSXQWqDkF" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
+<iframe width="1024" height="576" src="https://www.youtube.com/embed/7QXKBKDOkJE?si=5uFerjqPSXQWqDkF" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
@@ -67,7 +67,7 @@ The combination of remote table functions and access to the local file system ma
 
 ## chDB {#chdb}
 
-[chDB](/chdb) is ClickHouse embedded as an in-process database engine,, with Python being the primary implementation, though it's also available for Go, Rust, NodeJS, and Bun. This deployment option brings ClickHouse's powerful OLAP capabilities directly into your application's process, eliminating the need for a separate database installation.
+[chDB](/chdb) is ClickHouse embedded as an in-process database engine, with Python being the primary implementation, though it's also available for Go, Rust, NodeJS, and Bun. This deployment option brings ClickHouse's powerful OLAP capabilities directly into your application's process, eliminating the need for a separate database installation.
 
 <Image img={chDB} alt="chDB - Embedded ClickHouse" size="sm"/>
 

@@ -62,7 +62,7 @@ This should output something like:
 /opt/homebrew/bin/clickhouse
 ```
 
-Remove `clickhouse` from the quarantine bin by running `xattr -d com.apple.quarantine` following by the path from the previous command:
+Remove `clickhouse` from the quarantine bin by running `xattr -d com.apple.quarantine` followed by the path from the previous command:
 
 ```shell
 xattr -d com.apple.quarantine /opt/homebrew/bin/clickhouse
@@ -81,7 +81,7 @@ Use one of the following commands:
 clickhouse local [args]
 clickhouse client [args]
 clickhouse benchmark [args]
-...
+```
 
 ## Fix the issue by reinstalling ClickHouse {#fix-issue}
 

@@ -60,8 +60,8 @@
 local-host :)
 ```
 
-Table data is stored in the current directory and still available after a restart
-of ClickHouse server. If necessary, you can pass
+Table data is stored in the current directory and will still be available after a restart
+of the ClickHouse server. If necessary, you can pass
 `-C config.xml` as an additional command line argument to `./clickhouse server` 
 and provide further configuration in a configuration
 file. All available configuration settings are documented [here](/operations/server-configuration-parameters/settings) and in the 
@@ -71,7 +71,7 @@
 You are now ready to start sending SQL commands to ClickHouse!
 
 :::tip
-The [Quick Start](/get-started/quick-start) walks through the steps for creating tables and inserting data.
+The [Quick Start](/get-started/quick-start) walks you through the steps for creating tables and inserting data.
 :::
 
 </VerticalStepper>
@@ -36,14 +36,14 @@
 
 :::note
 Since ClickHouse's CI is evolving over time, the exact steps to download CI-generated builds may vary.
-Also, CI may delete too old build artifacts, making them unavailable for download.
+Also, CI may delete old build artifacts, making them unavailable for download.
 :::
 
-For example, to download a aarch64 binary for ClickHouse v23.4, follow these steps:
+For example, to download an aarch64 binary for ClickHouse v23.4, follow these steps:
 
 - Find the GitHub pull request for release v23.4: [Release pull request for branch 23.4](https://github.com/ClickHouse/ClickHouse/pull/49238)
-- Click "Commits", then click a commit similar to "Update autogenerated version to 23.4.2.1 and contributors" for the particular version you like to install.
+- Click "Commits", then click on a commit similar to "Update autogenerated version to 23.4.2.1 and contributors" for the particular version you'd like to install.
 - Click the green check / yellow dot / red cross to open the list of CI checks.
-- Click "Details" next to "Builds" in the list, it will open a page similar to [this page](https://s3.amazonaws.com/clickhouse-test-reports/46793/b460eb70bf29b19eadd19a1f959b15d186705394/clickhouse_build_check/report.html)
-- Find the rows with compiler = "clang-*-aarch64" - there are multiple rows.
+- Click "Details" next to "Builds" in the list; it will open a page similar to [this page](https://s3.amazonaws.com/clickhouse-test-reports/46793/b460eb70bf29b19eadd19a1f959b15d186705394/clickhouse_build_check/report.html).
+- Find the rows with compiler = "clang-*-aarch64" — there are multiple rows.
 - Download the artifacts for these builds.
@@ -89,7 +89,7 @@
 
 ## Lightweight deletes {#lightweight-deletes}
 
-Another option for deleting rows it to use the `DELETE FROM` command, which is referred to as a **lightweight delete**. The deleted rows are marked as deleted immediately and will be automatically filtered out of all subsequent queries, so you do not have to wait for a merging of parts or use the `FINAL` keyword. Cleanup of data happens asynchronously in the background.
+Another option for deleting rows is to use the `DELETE FROM` command, which is referred to as a **lightweight delete**. The deleted rows are marked as deleted immediately and will be automatically filtered out of all subsequent queries, so you do not have to wait for a merging of parts or use the `FINAL` keyword. Cleanup of data happens asynchronously in the background.
 
 ``` sql
 DELETE FROM [db.]table [ON CLUSTER cluster] [WHERE expr]

@@ -137,7 +137,7 @@ Unlike many traditional databases, ClickHouse supports an HTTP interface.
 Users can use this for both inserting and querying data, using any of the above formats.
 This is often preferable to ClickHouse's native protocol as it allows traffic to be easily switched with load balancers.
 We expect small differences in insert performance with the native protocol, which incurs a little less overhead.
-Existing clients use either of these protocols ( in some cases both e.g. the Go client).
+Existing clients use either of these protocols (in some cases both e.g. the Go client).
 The native protocol does allow query progress to be easily tracked.
 
 See [HTTP Interface](/interfaces/http) for further details.
@@ -149,7 +149,7 @@ For loading data from Postgres, users can use:
 - `PeerDB by ClickHouse`, an ETL tool specifically designed for PostgreSQL database replication. This is available in both:
   - ClickHouse Cloud - available through our [new connector](/integrations/clickpipes/postgres) in ClickPipes, our managed ingestion service.
   - Self-managed - via the [open-source project](https://github.com/PeerDB-io/peerdb).
-- The [PostgreSQL table engine](/integrations/postgresql#using-the-postgresql-table-engine) to read data directly as shown in previous examples. Typically appropriate if batch replication based on a known watermark, e.g., timestamp, is sufficient or if it's a one-off migration. This approach can scale to 10's millions of rows. Users looking to migrate larger datasets should consider multiple requests, each dealing with a chunk of the data. Staging tables can be used for each chunk prior to its partitions being moved to a final table. This allows failed requests to be retried. For further details on this bulk-loading strategy, see here.
+- The [PostgreSQL table engine](/integrations/postgresql#using-the-postgresql-table-engine) to read data directly as shown in previous examples. Typically appropriate if batch replication based on a known watermark, e.g., timestamp, is sufficient or if it's a one-off migration. This approach can scale to 10's of millions of rows. Users looking to migrate larger datasets should consider multiple requests, each dealing with a chunk of the data. Staging tables can be used for each chunk prior to its partitions being moved to a final table. This allows failed requests to be retried. For further details on this bulk-loading strategy, see here.
 - Data can be exported from PostgreSQL in CSV format. This can then be inserted into ClickHouse from either local files or via object storage using table functions.
 
 :::note Need help inserting large datasets?